Catalog · 32 models · 8 providers
Every speech-to-text model worth using, one rate card.
32
Total models
106
Languages
$0.0022
Cheapest /min
14.1%
Best WER
32 models
WER
CER
Speed Factor
0.1x
Uptime
Billing
per 1s
Retention
19 languages
(view all)
WER
CER
Speed Factor
0.3x
Uptime
Billing
per 1s
Retention
78 languages
(view all)
Google Cloud
Google's latest generative ASR foundation model — 85+ languages
WER
CER
Speed Factor
0.3x
Uptime
Billing
per 15s
Retention
72 languages
(view all)
WER
CER
Speed Factor
0.1x
Billing
per 15s
Retention
40 languages
(view all)
WER
CER
Speed Factor
0.2x
Billing
per 15s
Retention
40 languages
(view all)
WER
CER
Speed Factor
0.1x
Uptime
Billing
per 1s
Retention
15 languages
(view all)
WER
CER
Speed Factor
0.3x
Uptime
Billing
per 1s
Retention
47 languages
(view all)
Deepgram
First conversational ASR model built for voice agents — model-integrated endpointing
Realtime model — batch benchmarks not applicable
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
WER
CER
Speed Factor
0.1x
Uptime
Billing
per 1s
Retention
98 languages
(view all)
WER
CER
Speed Factor
0.2x
Billing
per 1s
Retention
98 languages
(view all)
OpenAI
GPT-4o transcription with built-in speaker diarization
WER
CER
Speed Factor
0.2x
Billing
per 1s
Retention
12 languages
(view all)
Cartesia
Whisper rearchitected for real-time and batch voice AI — fastest TTCT, 99-language coverage
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
100 languages
(view all)
Google Cloud
Conformer model for long-form audio (minutes to hours)
WER
CER
Speed Factor
0.4x
Billing
per 15s
Retention
40 languages
(view all)
WER
CER
Speed Factor
0.1x
Billing
per 15s
Retention
40 languages
(view all)
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
33 languages
(view all)
Deepgram
Optimized for human-to-bot interactions (IVR, voice assistants)
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
Deepgram
Optimized for earnings calls with finance vocabulary
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
Deepgram
Optimized for low-bandwidth single speaker voicemail
WER
CER
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
Deepgram
Deepgram's flagship model — 53% lower WER vs competitors, code-switching support
WER
CER
Speed Factor
0.1x
Uptime
Billing
per 1s
Retention
47 languages
(view all)
ElevenLabs
State-of-the-art batch STT — 90+ languages, speaker diarization, audio tagging
WER
CER
Speed Factor
0.2x
Billing
per 1s
Retention
76 languages
(view all)
ElevenLabs
Most accurate low-latency STT — <150ms, 90+ languages
Realtime model — batch benchmarks not applicable
Speed Factor
0.0x
Billing
per 1s
Retention
76 languages
(view all)
WER
CER
Speed Factor
0.1x
Uptime
Billing
per 1s
Retention
47 languages
(view all)
WER
CER
Speed Factor
0.2x
Uptime
Billing
per 15s
Retention
9 languages
(view all)
Amazon Web Services
AWS foundation model-powered ASR — 100+ languages
WER
CER
Speed Factor
0.4x
Uptime
Billing
per 1s
Retention
77 languages
(view all)
Amazon Web Services
Medical transcription with HIPAA eligibility
WER
CER
Speed Factor
0.4x
Uptime
Billing
per 1s
Retention
1 languages
(view all)
AssemblyAI
AssemblyAI's most powerful speech language model — up to 1000 keyterm phrases
WER
CER
Speed Factor
0.2x
Uptime
Billing
per 1s
Retention
6 languages
(view all)
AssemblyAI
Purpose-built for real-time voice agents — ~300ms immutable transcripts
Realtime model — batch benchmarks not applicable
Speed Factor
0.1x
Billing
per 1s
Retention
1 languages
(view all)
AssemblyAI
Multilingual streaming STT — English, Spanish, French, German, Italian, Portuguese
Realtime model — batch benchmarks not applicable
Speed Factor
0.1x
Billing
per 1s
Retention
6 languages
(view all)
WER
CER
Speed Factor
0.3x
Billing
per 1s
Retention
98 languages
(view all)
WER
CER
Speed Factor
0.3x
Billing
per 1s
Retention
98 languages
(view all)