OpenTranscription/ Blog
The journal

Speech‑to‑text, model by model

Two reads on every major speech model of 2026 — an opinionated analysis and a neutral, source-checked profile. Benchmarks, pricing, architecture, and where each one falls short. No vendor spin.

◆ 36 dispatches · 18 models tracked

Analysis

Opinion · 18 pieces
Abstract editorial illustration of a medical audio waveform flowing through a geometric cloud lattice into structured signal paths, in slate-teal and amber
Featured analysis

Amazon Transcribe Medical: what AWS actually ships, and what it won't tell you

What Amazon Transcribe Medical offers in 2026: features, pricing vs Google and Nuance, HIPAA posture, research clues, and where the service falls short.

Read the dispatch
Abstract editorial illustration of three amber audio waveform streams converging into a single geometric lattice on a slate-teal background
Analysis

Chirp 3: inside Google Cloud's 2025 speech stack, from HD voices to transcription

What Google Cloud Chirp 3 actually is: release timeline, WER and Elo benchmarks, pricing, known issues, and how it stacks up against Azure and ElevenLabs.

Abstract illustration of an older, simpler audio waveform path running parallel to newer, denser signal paths on a slate-teal background
Analysis

Deepgram Base in 2026: what the legacy model still does well

Where Deepgram Base fits in 2026: API behavior, variants, latency, concurrency, missing benchmarks, and when to pick Nova-3 or Flux instead.

Abstract illustration of an audio waveform passing through a series of geometric gateway layers before reaching a speech model lattice
Analysis

Deepgram Enhanced behind Future AGI's Agent Command Center: what the public record actually shows

A close read of Deepgram's Enhanced STT tier and Future AGI's Agent Command Center gateway, including the documentation gap between the two.

Abstract illustration of an audio waveform passing through a geometric lattice and splitting into discrete event paths, in slate-teal and amber
Analysis

Deepgram Flux: turn detection moves inside the speech model

What Deepgram Flux actually is: a turn-aware streaming STT model for voice agents, its event API, pricing, benchmarks, and where it falls short.

Abstract illustration of a clean audio waveform emerging from layers of noisy, tangled signal paths on a slate-teal background
Analysis

Deepgram Nova-3: the enterprise ASR workhorse you can buy but not inspect

A practitioner's breakdown of Deepgram Nova-3: WER claims, sub-300 ms streaming latency, pricing, languages, deployment options, and where it falls short.

Abstract illustration of an audio waveform flowing into a partially veiled geometric lattice, suggesting a precise transcription engine with hidden internals
Analysis

ElevenLabs Scribe v2: a top-tier transcription product built on an undisclosed model

What Scribe v2 actually changed from v1: features, pricing, benchmark results, API limits, and the architecture details ElevenLabs still won't publish.

Abstract illustration of three branching signal paths emerging from a single waveform, representing Chirp 3's split into transcription, HD voices, and voice cloning
Analysis

Google Cloud Chirp 3: capabilities, costs, and where it actually wins

What Chirp 3 really is: Google's STT and TTS model family, its streaming limits, real pricing math, and how it compares to OpenAI, ElevenLabs, and Deepgram.

Abstract illustration of a single steady waveform baseline running beneath a lattice of newer, branching signal paths
Analysis

Google Cloud's default speech model is legacy code that refuses to die

What Google Cloud STT's default model actually is, why Google calls it legacy, and when it still beats routing audio to Chirp or the latest models.

Abstract flat vector illustration of a long audio waveform flowing across a slate-teal field into a geometric Conformer-style lattice, with amber signal paths marking transcription output
Analysis

Google Cloud's latest_long model: what it is, what it costs, and when to pick something else

A practitioner's guide to Google Cloud Speech-to-Text latest_long: Conformer roots, pricing, quotas, diarization, and how it compares to V2 and Chirp.

Abstract illustration of two diverging signal paths, one short and immediate, one queued into a long batch, over a slate-teal field
Analysis

Google Cloud's latest_short and the batch paradox

Why Google's latest_short model is built for short utterances, not short files, and when running it through batch recognition actually makes sense.

Abstract illustration of a short audio waveform burst resolving into a clean signal path against a slate-teal field, suggesting a brief voice command being recognized
Analysis

Google's command_and_search model: the voice-search engine that quietly became legacy

The history, architecture, and current status of Google's command_and_search speech model, from 2016 Cloud Speech API beta to legacy status behind Chirp.

Abstract illustration of an audio waveform funneling into a geometric lattice that emits a clean signal path, in slate-teal and amber
Analysis

GPT-4o Transcribe: what OpenAI ships, claims, and still won't tell you

A practitioner's look at gpt-4o-transcribe: pricing, API surface, benchmark evidence, and why OpenAI now recommends the mini model over it.

Abstract illustration of a continuous audio waveform being cut into variable-length segments along a signal path, in slate-teal and amber
Analysis

Ink-Whisper: how Cartesia rebuilt Whisper for real-time voice agents

What Cartesia's Ink-Whisper got right on latency, where its accuracy fell behind by 2026, and why it mattered more as a stepping stone than a benchmark.

Abstract illustration of a live audio waveform resolving into committed signal blocks, in slate teal and amber
Analysis

Scribe v2 Realtime: ElevenLabs makes its play for live speech-to-text

ElevenLabs' Scribe v2 Realtime claims sub-150 ms latency, 93.5% accuracy in 30 languages, and $0.39/hr pricing. What the public record actually supports.

Abstract illustration of an audio waveform being reshaped by amber instruction lines flowing into a geometric lattice on a slate-teal background
Analysis

Universal-3 Pro: what AssemblyAI shipped, and what it still won't say

AssemblyAI's Universal-3 Pro reviewed: promptable transcription, WER benchmarks, pricing, compliance caveats, and what the public record still hides.

Abstract illustration of an audio waveform passing through an open lattice structure and emerging as layered infrastructure blocks
Analysis

Whisper large-v3 and the shift from open research to transcription infrastructure

How OpenAI's Whisper large-v3 went from MIT-licensed research artifact to the baseline layer of a managed transcription stack, and what got left unresolved.

Abstract illustration of a single audio waveform splitting into multiple branching signal paths across a slate-teal field, representing one speech model delivered through several enterprise channels
Analysis

Whisper on Azure: what Microsoft actually sells, and where it fits now

How Microsoft packages OpenAI's Whisper across Azure OpenAI and Azure Speech: limits, pricing signals, benchmarks, security, and where it fits in 2026.

Model profiles

Neutral spec sheets · 18 models

Fact-only reference profiles built from the source research — features, benchmarks, pricing, limits, and citations. No opinion.

Amazon Transcribe Medical

Reference profile of Amazon Transcribe Medical, the AWS managed API for US English medical speech-to-text, launched in December 2019.

Amazon Web Services

AssemblyAI Universal-3 Pro

Reference profile of AssemblyAI Universal-3 Pro: release date, prompting model, language support, pricing, deployment, benchmarks, and disclosed limits.

AssemblyAI

Chirp 3

Reference profile of Google Cloud Chirp 3, a managed speech model family covering multilingual transcription, HD text-to-speech, and instant custom voice.

Google (Google Cloud)

Deepgram Base

Reference profile of Deepgram Base, a legacy speech-to-text model family with task-specific variants, batch and streaming APIs, and self-hosted deployment.

Deepgram

Deepgram Enhanced

Reference spec sheet for Deepgram Enhanced, a 2022 speech-to-text tier, including its Future AGI Agent Command Center gateway integration.

Deepgram

Deepgram Flux

Reference spec for Deepgram Flux, a streaming conversational speech recognition model with built-in turn detection for voice agents.

Deepgram

Deepgram Nova-3

Reference profile of Deepgram Nova-3, a proprietary speech-to-text model family for batch and streaming transcription, released February 12, 2025.

Deepgram

ElevenLabs Scribe v2

Reference profile of ElevenLabs Scribe v2, a batch speech-to-text model released January 9, 2026: features, benchmarks, pricing, limits, and sources.

ElevenLabs

Google Cloud Chirp 3

Reference profile of Google Cloud Chirp 3: multilingual speech-to-text in Speech-to-Text V2, Chirp 3 HD voices, pricing, limits, and benchmarks.

Google Cloud

Google Cloud latest_short

Reference profile of Google Cloud Speech-to-Text latest_short, a rolling Conformer-based model tag for short utterances and command-style speech.

Google

Google Cloud Speech-to-Text default

Reference profile of Google Cloud Speech-to-Text's default model, a general-purpose legacy baseline retained for backwards compatibility.

Google

Google Cloud Speech-to-Text latest_long

Reference profile of Google Cloud Speech-to-Text latest_long, a Conformer-based long-form transcription model: features, pricing, limits, history.

Google

Google command_and_search (Google Speech-to-Text)

Reference profile of Google's command_and_search transcription model in Cloud Speech-to-Text, a legacy short-utterance model for voice commands and voice search.

Google

GPT-4o Transcribe

Reference profile of OpenAI's gpt-4o-transcribe speech-to-text model: release date, pricing, API features, benchmarks, and disclosed specifications.

OpenAI

Ink-Whisper

Reference profile of Ink-Whisper, Cartesia's Whisper-derived streaming speech-to-text model for real-time voice agents, launched June 10, 2025.

Cartesia

Microsoft Azure Whisper

Reference spec sheet for OpenAI's Whisper model as offered on Microsoft Azure: delivery paths, limits, languages, pricing, benchmarks, and release history.

Microsoft (model by OpenAI)

OpenAI Whisper large-v3

Reference profile of OpenAI Whisper large-v3: architecture, training data, release history, deployment options, pricing, limitations, and sources.

OpenAI

Scribe v2 Realtime

Reference profile of Scribe v2 Realtime, ElevenLabs' streaming speech-to-text model released November 11, 2025: specs, benchmarks, pricing, limits.

ElevenLabs
The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono