The ranker · benchmark 06 / 2026

Objective transcription benchmarks.

Every model. Same audio. Same metrics. Re-run on every model or pricing change.

Start transcribingOpen playground

28

Models Ranked

932

Total Benchmarks

11

Languages Tested

Jun 12, 2026

Last Updated

01 · Leader

Ink-Whisper

Cartesia

81.3

score

WER

22.80%

Latency

947ms

Cost

$0.0022/min

02 · Runner-up

GPT-4o Mini Transcribe

OpenAI

78.1

score

WER

20.32%

Latency

1.9s

Cost

$0.003/min

03 · Third

Nova-2 Phone Call

Deepgram

76.4

score

WER

17.93%

Latency

1.0s

Cost

$0.0058/min

#ModelOverall ScoreWERCERLatencyCostBenchmarks

01

Ink-Whisper

Cartesia

81.3

22.80%

15.08%

947ms

$0.0022/min

35

02

GPT-4o Mini Transcribe

OpenAI

78.1

20.32%

13.73%

1.9s

$0.003/min

35

03

Nova-2 Phone Call

Deepgram

76.4

17.93%

12.44%

1.0s

$0.0058/min

25

04

Nova-2 Voicemail

Deepgram

75.7

17.99%

12.35%

1.2s

$0.0058/min

25

05

Nova-2 Meeting

Deepgram

73.9

17.22%

11.78%

2.0s

$0.0058/min

25

06

Nova-2 Conversational AI

Deepgram

72.2

19.10%

13.08%

2.2s

$0.0058/min

25

07

Nova-2 Finance

Deepgram

71.5

19.39%

13.41%

2.4s

$0.0058/min

25

08

Nova-3

Deepgram

69.7

22.03%

14.88%

1.3s

$0.0077/min

34

09

Scribe v2

ElevenLabs

68.5

22.74%

27.14%

4.0s

$0.004/min

35

10

Nova-2

Deepgram

67.8

31.44%

41.14%

1.6s

$0.0058/min

34

11

Universal-3 Pro

AssemblyAI

67.1

17.30%

9.93%

5.8s

$0.0035/min

35

12

Whisper 1 (API)

OpenAI

65.1

24.09%

13.61%

3.6s

$0.006/min

35

13

GPT-4o Transcribe

OpenAI

63.1

33.37%

26.27%

2.8s

$0.006/min

35

14

Whisper Large V3

OpenAI

63.0

24.07%

13.64%

4.3s

$0.006/min

35

15

Enhanced

Deepgram

61.7

25.24%

15.53%

1.9s

$0.0165/min

31

16

Speechmatics Standard

Speechmatics

59.2

19.05%

11.31%

7.1s

$0.005/min

34

17

AssemblyAI Best

AssemblyAI

57.5

20.48%

11.80%

4.1s

$0.09/min

35

18

Speechmatics Enhanced

Speechmatics

49.6

15.60%

8.63%

8.7s

$0.0083/min

34

19

GPT-4o Transcribe Diarize

OpenAI

49.1

17.85%

11.86%

18.1s

$0.006/min

33

20

Base

Deepgram

48.8

56.14%

72.41%

1.1s

$0.0145/min

34

21

Amazon Transcribe

Amazon Web Services

47.7

20.63%

11.85%

11.9s

$0.006/min

35

22

Chirp 3

Google Cloud

43.0

14.05%

8.65%

10.3s

$0.0107/min

34

23

Amazon Transcribe Medical

Amazon Web Services

42.1

15.86%

11.08%

11.4s

$0.075/min

25

24

Google Telephony

Google Cloud

41.8

16.44%

10.52%

22.3s

$0.016/min

29

25

Google Latest (Long)

Google Cloud

36.3

27.50%

17.65%

13.5s

$0.0107/min

34

26

Google Command & Search

Google Cloud

30.3

39.37%

28.27%

11.6s

$0.016/min

34

27

Google Default

Google Cloud

30.0

39.90%

28.43%

11.9s

$0.016/min

34

28

Google Latest (Short)

Google Cloud

21.2

57.54%

51.69%

12.1s

$0.016/min

68

01

Ink-Whisper

81.3

22.80% WER

947ms

$0.0022/min

02

GPT-4o Mini Transcribe

78.1

20.32% WER

1.9s

$0.003/min

03

Nova-2 Phone Call

76.4

17.93% WER

1.0s

$0.0058/min

04

Nova-2 Voicemail

75.7

17.99% WER

1.2s

$0.0058/min

05

Nova-2 Meeting

73.9

17.22% WER

2.0s

$0.0058/min

06

Nova-2 Conversational AI

72.2

19.10% WER

2.2s

$0.0058/min

07

Nova-2 Finance

71.5

19.39% WER

2.4s

$0.0058/min

08

Nova-3

69.7

22.03% WER

1.3s

$0.0077/min

09

Scribe v2

68.5

22.74% WER

4.0s

$0.004/min

10

Nova-2

67.8

31.44% WER

1.6s

$0.0058/min

11

Universal-3 Pro

67.1

17.30% WER

5.8s

$0.0035/min

12

Whisper 1 (API)

65.1

24.09% WER

3.6s

$0.006/min

13

GPT-4o Transcribe

63.1

33.37% WER

2.8s

$0.006/min

14

Whisper Large V3

63.0

24.07% WER

4.3s

$0.006/min

15

Enhanced

61.7

25.24% WER

1.9s

$0.0165/min

16

Speechmatics Standard

59.2

19.05% WER

7.1s

$0.005/min

17

AssemblyAI Best

57.5

20.48% WER

4.1s

$0.09/min

18

Speechmatics Enhanced

49.6

15.60% WER

8.7s

$0.0083/min

19

GPT-4o Transcribe Diarize

49.1

17.85% WER

18.1s

$0.006/min

20

Base

48.8

56.14% WER

1.1s

$0.0145/min

21

Amazon Transcribe

47.7

20.63% WER

11.9s

$0.006/min

22

Chirp 3

43.0

14.05% WER

10.3s

$0.0107/min

23

Amazon Transcribe Medical

42.1

15.86% WER

11.4s

$0.075/min

24

Google Telephony

41.8

16.44% WER

22.3s

$0.016/min

25

Google Latest (Long)

36.3

27.50% WER

13.5s

$0.0107/min

26

Google Command & Search

30.3

39.37% WER

11.6s

$0.016/min

27

Google Default

30.0

39.90% WER

11.9s

$0.016/min

28

Google Latest (Short)

21.2

57.54% WER

12.1s

$0.016/min

Accuracy tradeoffs

WER vs. cost & speed

Category leaders

no single winner

Code-Switching

Scribe v2

ElevenLabs

15.81% WER

Conversational

Chirp 3

Google Cloud

6.36% WER

Finance

GPT-4o Transcribe Diarize

OpenAI

6.94% WER

General

Amazon Transcribe Medical

Amazon Web Services

4.54% WER

Legal

Universal-3 Pro

AssemblyAI

3.20% WER

Medical

Chirp 3

Google Cloud

1.94% WER

Noisy Environment

Nova-3

Deepgram

21.95% WER

Technical

Scribe v2

ElevenLabs

2.81% WER

How the score is built

weighted composite

50%

Accuracy

WER + CER vs. reference transcripts

30%

Speed

Median end-to-end latency

20%

Cost efficiency

Price per minute of audio

How We Benchmark

open methodology

01

Golden set

Curated test audio with verified reference transcripts across languages, accents, and noise levels

02

Same audio, every model

Every provider runs the identical test set. No cherry-picked clips, no provider-tuned inputs.

03

Event-driven & automated

Benchmarks re-run automatically on every model or pricing change, with no human bias. Every provider gets the same test audio.

04

Scoring

Overall score is a weighted composite: 50% accuracy (WER), 30% speed, 20% cost efficiency

Evaluation metrics

how errors are counted

WER

Word Error Rate

(Wrong + Extra + Missed words) ÷ Total words spoken

Percentage of words incorrectly transcribed (lower is better)

CER

Character Error Rate

(Wrong + Extra + Missed characters) ÷ Total characters spoken

Percentage of characters incorrectly transcribed (lower is better)

MER

Match Error Rate

(Wrong + Extra + Missed) ÷ (Correct + Wrong + Extra + Missed)

Ratio of errors to total alignment length (lower is better)

WIL

Word Information Lost

1 − (Correct ÷ Words spoken × Correct ÷ Words predicted)

Fraction of word information lost in transcription (lower is better)

Route to whichever model wins.

One endpoint, every provider. Pin the leader or let us auto-route to the best model under your accuracy and latency budget.

Get started for freeRead the spec ↗