Text to Speech - Streaming

Stream real-time audio generation with our SDKs. This guide covers how to implement streaming text-to-speech synthesis for immediate audio playback and low-latency applications.

Prerequisites

Before you begin, make sure you have:

  • An aiOla API key (get one here)
  • Python 3.10+ (for Python SDK) or Node.js 18+ (for TypeScript SDK)

Step 1: Set up authentication

First, generate an access token and create your client:

For comprehensive authentication details, security considerations, and token management strategies, see our Authentication Guide.

1from aiola import AiolaClient
2
3# Generate access token
4result = AiolaClient.grant_token(api_key='your-api-key')
5access_token = result['accessToken']
6
7# Create client
8client = AiolaClient(access_token=access_token)

Step 2: Basic streaming synthesis

Here’s how to stream audio generation for immediate processing:

1# Stream audio generation
2text = "This is streaming text to speech synthesis."
3
4stream = client.tts.stream(
5 text=text,
6 voice='tara',
7 language='en'
8)
9
10# Collect audio chunks as they arrive
11audio_chunks = []
12for chunk in stream:
13 audio_chunks.append(chunk)
14 # Process chunk in real-time if needed
15 print(f"Received chunk of {len(chunk)} bytes")
16
17print("Streaming synthesis completed!")

Step 3: Async streaming (Python)

For asynchronous streaming operations:

Python
1from aiola import AsyncAiolaClient
2import asyncio
3
4async def async_streaming_example():
5 # Generate access token
6 result = await AsyncAiolaClient.grant_token(api_key='your-api-key')
7 access_token = result['accessToken']
8
9 # Create async client
10 async_client = AsyncAiolaClient(access_token=access_token)
11
12 text = "This demonstrates async streaming synthesis."
13
14 stream = async_client.tts.stream(
15 text=text,
16 voice='tara',
17 language='en'
18 )
19
20 # Process chunks asynchronously
21 audio_chunks = []
22 async for chunk in stream:
23 audio_chunks.append(chunk)
24 print(f"Received chunk: {len(chunk)} bytes")
25
26 print("Async streaming completed!")
27
28if __name__ == "__main__":
29 # Run the async function
30 asyncio.run(async_streaming_example())

Best practices

  1. Chunk Processing: Process chunks immediately for lower latency
  2. Buffer Management: Implement proper audio buffering for smooth playback
  3. Error Recovery: Handle network issues and retry failed streams
  4. Memory Usage: Process chunks incrementally to avoid memory buildup
  5. Audio Quality: Use appropriate sample rates and formats for your use case

Next steps

Now that you’ve implemented streaming text-to-speech synthesis, you can: