Speech to Text - Streaming

Stream real-time audio transcription with our SDKs. This guide covers how to implement live audio streaming for immediate speech-to-text conversion in your applications.

Prerequisites

Before you begin, make sure you have:

  • An aiOla API key (get one here)
  • Python 3.10+ (for Python SDK) or Node.js 18+ (for TypeScript SDK)
  • Microphone access (for live audio streaming)

Installation

$pip install 'aiola[mic]'

Step 1: Set up authentication

First, generate an access token and create your client:

For comprehensive authentication details, security considerations, and token management strategies, see our Authentication Guide.

1import os
2from aiola import AiolaClient
3
4# Generate access token
5result = AiolaClient.grant_token(
6 api_key=os.getenv('AIOLA_API_KEY') or 'YOUR_API_KEY'
7)
8
9# Create client using the access token
10client = AiolaClient(
11 access_token=result['accessToken']
12)

Step 2: Initialize streaming connection

Create a streaming connection with event handlers:

1from aiola import AiolaClient, MicrophoneStream
2from aiola.types import LiveEvents
3
4# Create streaming connection
5connection = client.stt.stream(lang_code='en')
6
7# Set up event handlers
8@connection.on(LiveEvents.Transcript)
9def on_transcript(data):
10 print('Transcript:', data.get('transcript', data))
11
12@connection.on(LiveEvents.Connect)
13def on_connect():
14 print('Connected to streaming service')
15
16@connection.on(LiveEvents.Disconnect)
17def on_disconnect():
18 print('Disconnected from streaming service')
19
20@connection.on(LiveEvents.Error)
21def on_error(error):
22 print('Streaming error:', error)

Step 3: Start streaming with microphone

Start the streaming connection and pipe microphone audio:

1# Connect to the streaming service
2connection.connect()
3
4try:
5 # Capture audio from microphone using the SDK's MicrophoneStream
6 with MicrophoneStream(
7 channels=1,
8 samplerate=16000,
9 blocksize=4096,
10 ) as mic:
11 print("Listening... Speak into your microphone")
12 mic.stream_to(connection)
13
14 # Keep the main thread alive
15 while True:
16 try:
17 import time
18 time.sleep(0.1)
19 except KeyboardInterrupt:
20 print('Keyboard interrupt')
21 break
22
23except KeyboardInterrupt:
24 print('Keyboard interrupt')
25finally:
26 connection.disconnect()

Custom audio sources

For custom audio sources instead of microphone:

1import asyncio
2
3async def stream_audio_file():
4 # Connect to streaming service
5 connection = client.stt.stream(lang_code='en')
6
7 @connection.on(LiveEvents.Transcript)
8 def on_transcript(data):
9 print('Transcript:', data.get('transcript', data))
10
11 connection.connect()
12
13 # Stream audio file in chunks
14 with open('audio_file.wav', 'rb') as audio_file:
15 chunk_size = 4096
16 while True:
17 chunk = audio_file.read(chunk_size)
18 if not chunk:
19 break
20
21 # Send audio chunk
22 connection.send(chunk)
23
24 # Small delay to simulate real-time streaming
25 await asyncio.sleep(0.1)
26
27 # Close connection
28 connection.disconnect()
29
30# Run the async function
31asyncio.run(stream_audio_file())

Advanced streaming options

Keyword detection

Enable keyword detection during streaming:

1# Create connection with keyword detection
2connection = client.stt.stream(
3 lang_code='en',
4 keywords={
5 "postgres": "PostgreSQL",
6 "k eight s": "Kubernetes"
7 }
8)

Multiple language support

Stream with different languages:

1# Supported languages: en, de, fr, es, pr, zh, ja, it
2connection = client.stt.stream(lang_code='es') # Spanish

Error handling

Implement robust error handling for streaming:

1try:
2 connection = client.stt.stream(lang_code='en')
3
4 @connection.on(LiveEvents.Error)
5 def on_error(error):
6 print(f"Streaming error: {error}")
7 # Implement reconnection logic here
8
9 @connection.on(LiveEvents.Disconnect)
10 def on_disconnect():
11 print("Connection lost. Attempting to reconnect...")
12 # Implement reconnection logic
13
14 connection.connect()
15
16except Exception as e:
17 print(f"Failed to initialize streaming: {e}")

Complete working example

Here’s a complete Python example that combines all steps:

1import os
2from aiola import AiolaClient, MicrophoneStream
3from aiola.types import LiveEvents
4
5def live_streaming():
6 try:
7 # Step 1: Generate access token, save it
8 result = AiolaClient.grant_token(
9 api_key=os.getenv('AIOLA_API_KEY') or 'YOUR_API_KEY'
10 )
11
12 # Step 2: Create client using the access token
13 client = AiolaClient(
14 access_token=result['accessToken']
15 )
16
17 # Step 3: Start streaming
18 connection = client.stt.stream(
19 lang_code='en'
20 )
21
22 @connection.on(LiveEvents.Transcript)
23 def on_transcript(data):
24 print('Transcript:', data.get('transcript', data))
25
26 @connection.on(LiveEvents.Connect)
27 def on_connect():
28 print('Connected to streaming service')
29
30 @connection.on(LiveEvents.Disconnect)
31 def on_disconnect():
32 print('Disconnected from streaming service')
33
34 @connection.on(LiveEvents.Error)
35 def on_error(error):
36 print('Streaming error:', error)
37
38 connection.connect()
39
40 try:
41 # Capture audio from microphone using the SDK's MicrophoneStream
42 with MicrophoneStream(
43 channels=1,
44 samplerate=16000,
45 blocksize=4096,
46 ) as mic:
47 mic.stream_to(connection)
48
49 # Keep the main thread alive
50 while True:
51 try:
52 import time
53 time.sleep(0.1)
54 except KeyboardInterrupt:
55 print('Keyboard interrupt')
56 break
57
58 except KeyboardInterrupt:
59 print('Keyboard interrupt')
60
61 except Exception as error:
62 print('Error:', error)
63 finally:
64 connection.disconnect()
65
66if __name__ == "__main__":
67 live_streaming()

Best practices

  1. Audio Quality: Use 16kHz sample rate, mono channel for optimal results
  2. Chunk Size: 4096 bytes is recommended for real-time performance
  3. Error Handling: Always implement reconnection logic for production use
  4. Resource Cleanup: Properly disconnect streaming connections when done
  5. Audio Input: Handle audio input sources and permissions appropriately
  6. Latency: Consider buffering strategies for smoother transcription

Supported audio formats

For streaming, the following formats work best:

  • PCM 16-bit (recommended)
  • WAV uncompressed
  • Raw audio at 16kHz sample rate

Next steps

Now that you’ve implemented streaming transcription, you can:

Browser Examples

For web applications, check out our complete browser microphone streaming example: