Catalog · 32 models · 8 providers

Transcription models.

Every speech-to-text model worth using, one rate card.

Total models

106

Languages

$0.0022

Cheapest /min

14.1%

Best WER

32 models

Base

Deepgram

Batch

United States

Deepgram Base model — high volume, cost-effective

$0.0145/min

WER

56.14%

CER

72.41%

Speed Factor

0.1x

Uptime

100.0%

Billing

per 1s

Retention

No retention

19 languages

(view all)

Auto-detectDiarizationWord Timestamps+1

Transcribe View Benchmarks

AssemblyAI Best

AssemblyAI

Batch

United States

AssemblyAI's best transcription model

$0.09/min

WER

20.48%

CER

11.80%

Speed Factor

0.3x

Uptime

100.0%

Billing

per 1s

Retention

No retention

78 languages

(view all)

Auto-detectDiarizationWord Timestamps+1

Transcribe View Benchmarks

Chirp 3

Google Cloud

Batch

United States

Google's latest generative ASR foundation model — 85+ languages

$0.0107/min

Most Accurate

Best Medical

Best Conversations

WER

14.05%

CER

8.65%

Speed Factor

0.3x

Uptime

100.0%

Billing

per 15s

Retention

Custom

72 languages

(view all)

GenerativeAuto-detectDiarization+2

Transcribe View Benchmarks

Google Command & Search

Google Cloud

Batch

United States

Optimized for short queries and voice commands

$0.016/min

WER

39.37%

CER

28.27%

Speed Factor

0.1x

Billing

per 15s

Retention

Custom

40 languages

(view all)

Word Timestamps

Transcribe View Benchmarks

Google Default

Google Cloud

Batch

United States

General-purpose model

$0.016/min

WER

39.90%

CER

28.43%

Speed Factor

0.2x

Billing

per 15s

Retention

Custom

40 languages

(view all)

DiarizationWord Timestamps

Transcribe View Benchmarks

Enhanced

Deepgram

Batch

United States

Deepgram Enhanced model — high accuracy for uncommon words

$0.0165/min

WER

25.24%

CER

15.53%

Speed Factor

0.1x

Uptime

100.0%

Billing

per 1s

Retention

No retention

15 languages

(view all)

Auto-detectDiarizationWord Timestamps+1

Transcribe View Benchmarks

Speechmatics Enhanced

Speechmatics

Batch

Europe

Highest accuracy model — 55+ languages, best-in-class

$0.0083/min

Best Accented

WER

15.60%

CER

8.63%

Speed Factor

0.3x

Uptime

100.0%

Billing

per 1s

Retention

No retention

47 languages

(view all)

Async APIAuto-detectDiarization+3

Transcribe View Benchmarks

Flux

Deepgram

Realtime

United States

First conversational ASR model built for voice agents — model-integrated endpointing

$0.0077/min

Realtime model — batch benchmarks not applicable

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

endpointing

turn detection+1

Transcribe View Benchmarks

GPT-4o Mini Transcribe

OpenAI

Batch

United States

GPT-4o Mini optimized for fast transcription

$0.003/min

WER

20.32%

CER

13.73%

Speed Factor

0.1x

Uptime

100.0%

Billing

per 1s

Retention

No retention

98 languages

(view all)

Auto-detectWord Timestamps

Transcribe View Benchmarks

GPT-4o Transcribe

OpenAI

Batch

United States

GPT-4o optimized for transcription with improved WER

$0.006/min

WER

33.37%

CER

26.27%

Speed Factor

0.2x

Billing

per 1s

Retention

No retention

98 languages

(view all)

Auto-detectWord Timestamps

Transcribe View Benchmarks

GPT-4o Transcribe Diarize

OpenAI

Batch

United States

GPT-4o transcription with built-in speaker diarization

$0.006/min

WER

17.85%

CER

11.86%

Speed Factor

0.2x

Billing

per 1s

Retention

No retention

12 languages

(view all)

Auto-detectDiarization

Transcribe View Benchmarks

Ink-Whisper

Cartesia

Batch & Realtime

United States

Whisper rearchitected for real-time and batch voice AI — fastest TTCT, 99-language coverage

$0.0022/min

Best Value

WER

22.80%

CER

15.08%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

100 languages

(view all)

Word Timestampsdynamic chunking

Transcribe View Benchmarks

Google Latest (Long)

Google Cloud

Batch

United States

Conformer model for long-form audio (minutes to hours)

$0.0107/min

WER

27.50%

CER

17.65%

Speed Factor

0.4x

Billing

per 15s

Retention

Custom

40 languages

(view all)

Word Timestamps

Transcribe View Benchmarks

Google Latest (Short)

Google Cloud

Batch

United States

Conformer model for short utterances (< 60s)

$0.016/min

WER

57.54%

CER

51.69%

Speed Factor

0.1x

Billing

per 15s

Retention

Custom

40 languages

(view all)

Word Timestamps

Transcribe View Benchmarks

Nova-2

Deepgram

Batch & Realtime

United States

Deepgram's Nova-2 speech recognition

$0.0058/min

WER

31.44%

CER

41.14%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

33 languages

(view all)

Auto-detectDiarization

Transcribe View Benchmarks

Nova-2 Conversational AI

Deepgram

Batch & Realtime

United States

Optimized for human-to-bot interactions (IVR, voice assistants)

$0.0058/min

WER

19.10%

CER

13.08%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

Diarization

Word Timestamps+1

Transcribe View Benchmarks

Nova-2 Finance

Deepgram

Batch & Realtime

United States

Optimized for earnings calls with finance vocabulary

$0.0058/min

WER

19.39%

CER

13.41%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

Diarization

Word Timestamps+1

Transcribe View Benchmarks

Nova-2 Meeting

Deepgram

Batch & Realtime

United States

Optimized for conference room audio

$0.0058/min

WER

17.22%

CER

11.78%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

Diarization

Word Timestamps+1

Transcribe View Benchmarks

Nova-2 Phone Call

Deepgram

Batch & Realtime

United States

Optimized for low-bandwidth phone call audio

$0.0058/min

WER

17.93%

CER

12.44%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

Diarization

Word Timestamps+1

Transcribe View Benchmarks

Nova-2 Voicemail

Deepgram

Batch & Realtime

United States

Optimized for low-bandwidth single speaker voicemail

$0.0058/min

WER

17.99%

CER

12.35%

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

Word TimestampsCustom Vocabulary

Transcribe View Benchmarks

Nova-3

Deepgram

Batch & Realtime

United States

Deepgram's flagship model — 53% lower WER vs competitors, code-switching support

$0.0077/min

Fastest

WER

22.03%

CER

14.88%

Speed Factor

0.1x

Uptime

100.0%

Billing

per 1s

Retention

No retention

47 languages

(view all)

Auto-detectDiarizationSmart Format+3

Transcribe View Benchmarks

Scribe v2

ElevenLabs

Batch

United States

State-of-the-art batch STT — 90+ languages, speaker diarization, audio tagging

$0.004/min

Best Technical

WER

22.74%

CER

27.14%

Speed Factor

0.2x

Billing

per 1s

Retention

30 days

76 languages

(view all)

Auto-detectDiarization

Transcribe View Benchmarks

Scribe v2 Realtime

ElevenLabs

Realtime

United States

Most accurate low-latency STT — <150ms, 90+ languages

$0.0065/min

Realtime model — batch benchmarks not applicable

Speed Factor

0.0x

Billing

per 1s

Retention

30 days

76 languages

(view all)

streamingAuto-detect

Transcribe View Benchmarks

Speechmatics Standard

Speechmatics

Batch

Europe

Cost-effective model — fast turnaround, 55+ languages

$0.005/min

WER

19.05%

CER

11.31%

Speed Factor

0.1x

Uptime

100.0%

Billing

per 1s

Retention

No retention

47 languages

(view all)

Async APIAuto-detectDiarization+3

Transcribe View Benchmarks

Google Telephony

Google Cloud

Batch

United States

Optimized for telephony audio (8kHz)

$0.016/min

WER

16.44%

CER

10.52%

Speed Factor

0.2x

Uptime

100.0%

Billing

per 15s

Retention

Custom

9 languages

(view all)

Word Timestamps

Transcribe View Benchmarks

Amazon Transcribe

Amazon Web Services

Batch

United States

AWS foundation model-powered ASR — 100+ languages

$0.006/min

WER

20.63%

CER

11.85%

Speed Factor

0.4x

Uptime

100.0%

Billing

per 1s

Retention

Custom

77 languages

(view all)

Async APIAuto-detectDiarization+2

Transcribe View Benchmarks

Amazon Transcribe Medical

Amazon Web Services

Batch

United States

Medical transcription with HIPAA eligibility

$0.075/min

WER

15.86%

CER

11.08%

Speed Factor

0.4x

Uptime

100.0%

Billing

per 1s

Retention

Custom

1 languages

(view all)

HIPAA CompliantAsync APIDiarization+2

Transcribe View Benchmarks

Universal-3 Pro

AssemblyAI

Batch

United States

AssemblyAI's most powerful speech language model — up to 1000 keyterm phrases

$0.0035/min

Best Legal

Best Noisy

WER

17.30%

CER

9.93%

Speed Factor

0.2x

Uptime

100.0%

Billing

per 1s

Retention

No retention

6 languages

(view all)

Auto-detectDiarization

Transcribe View Benchmarks

Universal Streaming

AssemblyAI

Realtime

United States

Purpose-built for real-time voice agents — ~300ms immutable transcripts

$0.0025/min

Realtime model — batch benchmarks not applicable

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

1 languages

(view all)

streamingWord Timestamps

Transcribe View Benchmarks

Universal Streaming Multilingual

AssemblyAI

Realtime

United States

Multilingual streaming STT — English, Spanish, French, German, Italian, Portuguese

$0.0025/min

Realtime model — batch benchmarks not applicable

Speed Factor

0.1x

Billing

per 1s

Retention

No retention

6 languages

(view all)

streaming

Word Timestamps

Transcribe View Benchmarks

Whisper 1 (API)

OpenAI

Batch

United States

OpenAI's Whisper API model

$0.006/min

WER

24.09%

CER

13.61%

Speed Factor

0.3x

Billing

per 1s

Retention

No retention

98 languages

(view all)

Auto-detectWord Timestamps

Transcribe View Benchmarks

Whisper Large V3

OpenAI

Batch

United States

OpenAI's Whisper large-v3 model

$0.006/min

WER

24.07%

CER

13.64%

Speed Factor

0.3x

Billing

per 1s

Retention

No retention

98 languages

(view all)

Auto-detectWord Timestamps

Transcribe View Benchmarks