Speech to Text SDK
Convert audio to text with our easy-to-use SDKs. This guide will walk you through the essential steps to get started with speech transcription in your applications.
Prerequisites
Before you begin, make sure you have:
- An aiOla API key (get one here)
- Python 3.10+ (for Python SDK) or Node.js 18+ (for TypeScript SDK)
Step 1: Install the SDK
Step 2: Set up authentication
First, you’ll need to generate an access token using your API key:
For detailed authentication information, security best practices, and advanced token management, see our Authentication Guide.
Step 3: Transcribe an audio file
Here’s how to transcribe an audio file:
Real-time streaming
For real-time audio streaming transcription, check out our dedicated Speech to Text Streaming Guide which covers:
- Live microphone streaming
- Event-based transcription handling
- Custom audio source streaming
- Keyword detection during streaming
- Connection management and error handling
Supported audio formats
The SDK supports the following audio formats:
- WAV (
.wav
) - FLAC (
.flac
) - AIFF (
.aiff
) - M4A (
.m4a
) - MP4 (
.mp4
) - MOV (
.mov
) - M4V (
.m4v
) - AAC (
.aac
) - MKV (
.mkv
) - MP3 (
.mp3
) - Opus (
.opus
)
File size limitations
- Maximum file size: 50 MB
- For larger files, consider using streaming transcription or splitting the audio into smaller chunks
Supported languages
The SDK supports the following languages:
- English (
en
) - German (
de
) - French (
fr
) - Spanish (
es
) - Portuguese (
pr
) - Chinese (
zh
) - Japanese (
ja
) - Italian (
it
)
Error handling
Always implement proper error handling for your API calls:
Advanced options
Keyword detection
You can enable keyword detection for more accurate transcription:
Custom configuration
For enterprise users, you can configure custom endpoints:
Next steps
Now that you’ve successfully transcribed your first audio file, you can:
- Explore Real-time Streaming for live audio transcription
- Learn about Text to Speech SDK for converting text back to speech
- Check out the SDK repositories for more examples:
Browser Examples
For web applications, check out our complete browser microphone streaming example:
- Browser Microphone Streaming - Full web app example showing real-time microphone transcription in the browser