Speech to Text SDK

Convert audio to text with our easy-to-use SDKs. This guide will walk you through the essential steps to get started with speech transcription in your applications.

Prerequisites

Before you begin, make sure you have:

  • An aiOla API key (get one here)
  • Python 3.10+ (for Python SDK) or Node.js 18+ (for TypeScript SDK)

Step 1: Install the SDK

$pip install aiola
># or for microphone support
>pip install 'aiola[mic]'

Step 2: Set up authentication

First, you’ll need to generate an access token using your API key:

For detailed authentication information, security best practices, and advanced token management, see our Authentication Guide.

1from aiola import AiolaClient
2
3# Generate access token
4result = AiolaClient.grant_token(api_key='your-api-key')
5access_token = result['accessToken']
6
7# Create client
8client = AiolaClient(access_token=access_token)

Step 3: Transcribe an audio file

Here’s how to transcribe an audio file:

1# Transcribe audio file
2with open('path/to/your/audio.wav', 'rb') as audio_file:
3 transcript = client.stt.transcribe_file(
4 file=audio_file,
5 language='en'
6 )
7
8print(transcript)

Real-time streaming

For real-time audio streaming transcription, check out our dedicated Speech to Text Streaming Guide which covers:

  • Live microphone streaming
  • Event-based transcription handling
  • Custom audio source streaming
  • Keyword detection during streaming
  • Connection management and error handling

Supported audio formats

The SDK supports the following audio formats:

  • WAV (.wav)
  • FLAC (.flac)
  • AIFF (.aiff)
  • M4A (.m4a)
  • MP4 (.mp4)
  • MOV (.mov)
  • M4V (.m4v)
  • AAC (.aac)
  • MKV (.mkv)
  • MP3 (.mp3)
  • Opus (.opus)

File size limitations

  • Maximum file size: 50 MB
  • For larger files, consider using streaming transcription or splitting the audio into smaller chunks

Supported languages

The SDK supports the following languages:

  • English (en)
  • German (de)
  • French (fr)
  • Spanish (es)
  • Portuguese (pr)
  • Chinese (zh)
  • Japanese (ja)
  • Italian (it)

Error handling

Always implement proper error handling for your API calls:

1try:
2 with open('audio.wav', 'rb') as audio_file:
3 transcript = client.stt.transcribe_file(
4 file=audio_file,
5 language='en'
6 )
7 print(transcript)
8except Exception as e:
9 print(f"Transcription failed: {e}")

Advanced options

Keyword detection

You can enable keyword detection for more accurate transcription:

1# Keyword detection with custom transcription
2transcript = client.stt.transcribe_file(
3 file=audio_file,
4 language='en',
5 keywords={
6 "postgres": "PostgreSQL",
7 "k eight s": "Kubernetes"
8 }
9)

Custom configuration

For enterprise users, you can configure custom endpoints:

1# Custom base URL (enterprise)
2client = AiolaClient(
3 access_token=access_token,
4 base_url='https://your-custom-endpoint.com'
5)

Next steps

Now that you’ve successfully transcribed your first audio file, you can:

Browser Examples

For web applications, check out our complete browser microphone streaming example: