Speech‑to‑text, model by model
Two reads on every major speech model of 2026 — an opinionated analysis and a neutral, source-checked profile. Benchmarks, pricing, architecture, and where each one falls short. No vendor spin.
Analysis
Opinion · 18 pieces
Amazon Transcribe Medical: what AWS actually ships, and what it won't tell you
What Amazon Transcribe Medical offers in 2026: features, pricing vs Google and Nuance, HIPAA posture, research clues, and where the service falls short.
Read the dispatch →
Chirp 3: inside Google Cloud's 2025 speech stack, from HD voices to transcription
What Google Cloud Chirp 3 actually is: release timeline, WER and Elo benchmarks, pricing, known issues, and how it stacks up against Azure and ElevenLabs.

Deepgram Base in 2026: what the legacy model still does well
Where Deepgram Base fits in 2026: API behavior, variants, latency, concurrency, missing benchmarks, and when to pick Nova-3 or Flux instead.

Deepgram Enhanced behind Future AGI's Agent Command Center: what the public record actually shows
A close read of Deepgram's Enhanced STT tier and Future AGI's Agent Command Center gateway, including the documentation gap between the two.

Deepgram Flux: turn detection moves inside the speech model
What Deepgram Flux actually is: a turn-aware streaming STT model for voice agents, its event API, pricing, benchmarks, and where it falls short.

Deepgram Nova-3: the enterprise ASR workhorse you can buy but not inspect
A practitioner's breakdown of Deepgram Nova-3: WER claims, sub-300 ms streaming latency, pricing, languages, deployment options, and where it falls short.

ElevenLabs Scribe v2: a top-tier transcription product built on an undisclosed model
What Scribe v2 actually changed from v1: features, pricing, benchmark results, API limits, and the architecture details ElevenLabs still won't publish.

Google Cloud Chirp 3: capabilities, costs, and where it actually wins
What Chirp 3 really is: Google's STT and TTS model family, its streaming limits, real pricing math, and how it compares to OpenAI, ElevenLabs, and Deepgram.

Google Cloud's default speech model is legacy code that refuses to die
What Google Cloud STT's default model actually is, why Google calls it legacy, and when it still beats routing audio to Chirp or the latest models.

Google Cloud's latest_long model: what it is, what it costs, and when to pick something else
A practitioner's guide to Google Cloud Speech-to-Text latest_long: Conformer roots, pricing, quotas, diarization, and how it compares to V2 and Chirp.

Google Cloud's latest_short and the batch paradox
Why Google's latest_short model is built for short utterances, not short files, and when running it through batch recognition actually makes sense.

Google's command_and_search model: the voice-search engine that quietly became legacy
The history, architecture, and current status of Google's command_and_search speech model, from 2016 Cloud Speech API beta to legacy status behind Chirp.

GPT-4o Transcribe: what OpenAI ships, claims, and still won't tell you
A practitioner's look at gpt-4o-transcribe: pricing, API surface, benchmark evidence, and why OpenAI now recommends the mini model over it.

Ink-Whisper: how Cartesia rebuilt Whisper for real-time voice agents
What Cartesia's Ink-Whisper got right on latency, where its accuracy fell behind by 2026, and why it mattered more as a stepping stone than a benchmark.

Scribe v2 Realtime: ElevenLabs makes its play for live speech-to-text
ElevenLabs' Scribe v2 Realtime claims sub-150 ms latency, 93.5% accuracy in 30 languages, and $0.39/hr pricing. What the public record actually supports.

Universal-3 Pro: what AssemblyAI shipped, and what it still won't say
AssemblyAI's Universal-3 Pro reviewed: promptable transcription, WER benchmarks, pricing, compliance caveats, and what the public record still hides.

Whisper large-v3 and the shift from open research to transcription infrastructure
How OpenAI's Whisper large-v3 went from MIT-licensed research artifact to the baseline layer of a managed transcription stack, and what got left unresolved.

Whisper on Azure: what Microsoft actually sells, and where it fits now
How Microsoft packages OpenAI's Whisper across Azure OpenAI and Azure Speech: limits, pricing signals, benchmarks, security, and where it fits in 2026.
Model profiles
Neutral spec sheets · 18 modelsFact-only reference profiles built from the source research — features, benchmarks, pricing, limits, and citations. No opinion.
Amazon Transcribe Medical
Reference profile of Amazon Transcribe Medical, the AWS managed API for US English medical speech-to-text, launched in December 2019.
Amazon Web ServicesAssemblyAI Universal-3 Pro
Reference profile of AssemblyAI Universal-3 Pro: release date, prompting model, language support, pricing, deployment, benchmarks, and disclosed limits.
AssemblyAIChirp 3
Reference profile of Google Cloud Chirp 3, a managed speech model family covering multilingual transcription, HD text-to-speech, and instant custom voice.
Google (Google Cloud)Deepgram Base
Reference profile of Deepgram Base, a legacy speech-to-text model family with task-specific variants, batch and streaming APIs, and self-hosted deployment.
DeepgramDeepgram Enhanced
Reference spec sheet for Deepgram Enhanced, a 2022 speech-to-text tier, including its Future AGI Agent Command Center gateway integration.
DeepgramDeepgram Flux
Reference spec for Deepgram Flux, a streaming conversational speech recognition model with built-in turn detection for voice agents.
DeepgramDeepgram Nova-3
Reference profile of Deepgram Nova-3, a proprietary speech-to-text model family for batch and streaming transcription, released February 12, 2025.
DeepgramElevenLabs Scribe v2
Reference profile of ElevenLabs Scribe v2, a batch speech-to-text model released January 9, 2026: features, benchmarks, pricing, limits, and sources.
ElevenLabsGoogle Cloud Chirp 3
Reference profile of Google Cloud Chirp 3: multilingual speech-to-text in Speech-to-Text V2, Chirp 3 HD voices, pricing, limits, and benchmarks.
Google CloudGoogle Cloud latest_short
Reference profile of Google Cloud Speech-to-Text latest_short, a rolling Conformer-based model tag for short utterances and command-style speech.
GoogleGoogle Cloud Speech-to-Text default
Reference profile of Google Cloud Speech-to-Text's default model, a general-purpose legacy baseline retained for backwards compatibility.
GoogleGoogle Cloud Speech-to-Text latest_long
Reference profile of Google Cloud Speech-to-Text latest_long, a Conformer-based long-form transcription model: features, pricing, limits, history.
GoogleGoogle command_and_search (Google Speech-to-Text)
Reference profile of Google's command_and_search transcription model in Cloud Speech-to-Text, a legacy short-utterance model for voice commands and voice search.
GoogleGPT-4o Transcribe
Reference profile of OpenAI's gpt-4o-transcribe speech-to-text model: release date, pricing, API features, benchmarks, and disclosed specifications.
OpenAIInk-Whisper
Reference profile of Ink-Whisper, Cartesia's Whisper-derived streaming speech-to-text model for real-time voice agents, launched June 10, 2025.
CartesiaMicrosoft Azure Whisper
Reference spec sheet for OpenAI's Whisper model as offered on Microsoft Azure: delivery paths, limits, languages, pricing, benchmarks, and release history.
Microsoft (model by OpenAI)OpenAI Whisper large-v3
Reference profile of OpenAI Whisper large-v3: architecture, training data, release history, deployment options, pricing, limitations, and sources.
OpenAIScribe v2 Realtime
Reference profile of Scribe v2 Realtime, ElevenLabs' streaming speech-to-text model released November 11, 2025: specs, benchmarks, pricing, limits.
ElevenLabs