Quickstart | aiOla

Convert audio to text with our easy-to-use SDKs. This guide will walk you through the essential steps to get started with speech transcription in your applications.

Prerequisites

Before you begin, make sure you have:

An aiOla API key (get one here)
Python 3.10+ (for Python SDK) or Node.js 18+ (for TypeScript SDK)

Step 1: Install the SDK

$ pip install aiola
> # or for microphone support
> pip install 'aiola[mic]'

Step 2: Set up authentication

First, you’ll need to generate an access token using your API key:

For detailed authentication information, security best practices, and advanced token management, see our Authentication Guide.

1 from aiola import AiolaClient
2 
3 # Generate access token
4 result = AiolaClient.grant_token(api_key='your-api-key')
5 access_token = result.access_token
6 
7 # Create client
8 client = AiolaClient(access_token=access_token)

Step 3: Transcribe an audio file

Here’s how to transcribe an audio file:

1 # Transcribe audio file
2 with open('path/to/your/audio.wav', 'rb') as audio_file:
3     transcript = client.stt.transcribe_file(
4         file=audio_file,
5         language='en'
6     )
7 
8 print(transcript)

Real-time streaming

For real-time audio streaming transcription, check out our dedicated Speech to Text Streaming Guide which covers:

Live microphone streaming
Event-based transcription handling
Custom audio source streaming
Keyword detection during streaming
Connection management and error handling

Supported audio formats

The SDK supports the following audio formats:

WAV (.wav)
FLAC (.flac)
AIFF (.aiff)
M4A (.m4a)
MP4 (.mp4)
MOV (.mov)
M4V (.m4v)
AAC (.aac)
MKV (.mkv)
MP3 (.mp3)
Opus (.opus)

File size limitations

Maximum file size: 50 MB
For larger files, consider using streaming transcription or splitting the audio into smaller chunks

Supported languages

The SDK supports the following languages:

English (en)
German (de)
French (fr)
Spanish (es)
Portuguese (pr)
Chinese (zh)
Japanese (ja)
Italian (it)

Error handling

Always implement proper error handling for your API calls:

1 try:
2     with open('audio.wav', 'rb') as audio_file:
3         transcript = client.stt.transcribe_file(
4             file=audio_file,
5             language='en'
6         )
7     print(transcript)
8 except Exception as e:
9     print(f"Transcription failed: {e}")

Advanced options

Keyword detection

You can enable keyword detection for more accurate transcription:

1 # Keyword detection with custom transcription
2 transcript = client.stt.transcribe_file(
3     file=audio_file,
4     language='en',
5     keywords={
6         "postgres": "PostgreSQL",
7         "k eight s": "Kubernetes"
8     }
9 )

Custom configuration

For enterprise users, you can configure custom endpoints:

1 # Custom base URL (enterprise)
2 client = AiolaClient(
3     access_token=access_token,
4     base_url='https://your-custom-endpoint.com'
5 )

Next steps

Now that you’ve successfully transcribed your first audio file, you can:

Explore Real-time Streaming for live audio transcription
Learn about Text to Speech SDK for converting text back to speech
Check out the SDK repositories for more examples:
- Python SDK
- TypeScript SDK

Browser Examples

For web applications, check out our complete browser microphone streaming example:

Browser Microphone Streaming - Full web app example showing real-time microphone transcription in the browser