Audio processing
The audio module provides tools for text-to-speech synthesis and speech-to-text transcription using advanced AI models.
Available tools
Text-to-speech
Convert text to natural-sounding speech using AI voices.
Parameters
-
input
(string, required)- Text to convert to speech
- Maximum 4096 characters
-
voice
(string, optional)- Voice to use for speech generation
- Options:
alloy
(default)echo
fable
onyx
nova
shimmer
-
model
(string, optional)- Model to use for generation
- Options:
tts-1
(default)tts-1-hd
-
response_format
(string, optional)- Audio file format
- Options:
mp3
(default)opus
aac
flac
-
speed
(float, optional)- Speed of generated audio
- Range: 0.25 to 4.0
- Default: 1.0
Returns
A tuple containing:
-
Status dictionary:
success
: Boolean indicating successdescription
: Description of generated audiodetails
: Dictionary containing:filename
: Generated audio filenamevoice
: Voice usedmodel
: Model usedformat
: Audio formatspeed
: Speed settingtext_length
: Length of input text
error
: Error message if failed
-
Files array containing:
content
: Audio file content (bytes)filename
: Audio filenamemime_type
: Audio MIME typedescription
: Audio description
Speech-to-text
Transcribe speech from audio files to text.
Parameters
-
file_url
(string, required)- Path to the audio file
- Must be a valid local file path
-
language
(string, optional)- Language code in ISO-639-1 format
- If not specified, auto-detects language
-
prompt
(string, optional)- Text to guide transcription style
- Useful for continuing previous segments
Returns
A dictionary containing:
success
: Boolean indicating successtext
: Transcribed textdetails
: Dictionary containing:model
: Model usedlanguage
: Language detected/usedfile_url
: Original file path
error
: Error message if failed
Example usage
from tyler.models import Agent, Thread, Message
# Create an agent with audio tools
agent = Agent(
model_name="gpt-4o",
purpose="To help with audio processing",
tools=["audio"]
)
# Create a thread for text-to-speech
thread = Thread()
message = Message(
role="user",
content='Convert this text to speech: "Hello, how are you today?"'
)
thread.add_message(message)
# Process the thread - agent will use text-to-speech tool
processed_thread, new_messages = await agent.go(thread)
# Example of speech-to-text
transcribe_thread = Thread()
message = Message(
role="user",
content="Transcribe the audio from recording.mp3"
)
transcribe_thread.add_message(message)
# Process the thread - agent will use speech-to-text tool
processed_transcription, new_messages = await agent.go(transcribe_thread)
Best practices
-
Text-to-Speech
- Keep text within length limits
- Choose appropriate voice for content
- Consider audio quality needs
- Use natural language input
-
Speech-to-Text
- Use high-quality audio input
- Specify language when known
- Provide context with prompts
- Consider audio format support
-
Audio Quality
- Select appropriate formats
- Use HD models when needed
- Adjust speed carefully
- Monitor file sizes
-
Resource Management
- Handle large files properly
- Monitor API usage
- Manage storage space
- Consider bandwidth usage
Common use cases
-
Content Creation
- Audiobook generation
- Voice-over production
- Podcast content
- Educational materials
-
Accessibility
- Text-to-speech for visually impaired
- Transcription for hearing impaired
- Multi-language support
- Audio documentation
-
Audio Processing
- Meeting transcription
- Voice note conversion
- Audio content analysis
- Language learning tools
Limitations
-
Text-to-Speech
- 4096 character limit per request
- Limited voice options
- Language constraints
- Pronunciation accuracy
-
Speech-to-Text
- Background noise sensitivity
- Accent recognition
- Speaker separation
- Technical terminology
-
General Constraints
- API rate limits
- File size limits
- Processing time
- Cost considerations
Error handling
Common errors and solutions:
-
Input Validation
- Check text length
- Verify file formats
- Validate parameters
- Handle special characters
-
Processing Issues
- Handle API errors
- Manage timeouts
- Process format errors
- Handle quality issues
-
Resource Errors
- Monitor API quotas
- Handle storage limits
- Manage bandwidth
- Control concurrency