OpenTranscription/ Blog
2026-07-03 · ANALYSIS

Deepgram Nova-3: the enterprise ASR workhorse you can buy but not inspect

A practitioner's breakdown of Deepgram Nova-3: WER claims, sub-300 ms streaming latency, pricing, languages, deployment options, and where it falls short.

Abstract illustration of a clean audio waveform emerging from layers of noisy, tangled signal paths on a slate-teal background

Nova-3 is Deepgram's flagship general-purpose speech-to-text family for batch and streaming transcription, and the company is not shy about the numbers: a 5.26% median batch WER and 6.84% median streaming WER on Deepgram's own benchmark suite, with sub-300 ms transcription latency targets for streaming workloads. It ships with the kind of controls production teams actually ask for, including Keyterm Prompting, redaction, smart formatting, diarization, multilingual code-switching, and self-hosted deployment. The catch, and it is a real one, is that almost everything we know about Nova-3 comes from Deepgram itself. There is no architecture paper, no parameter count, no training-corpus disclosure. You can evaluate Nova-3 thoroughly as a product. As a scientific artifact, you mostly have to take the vendor's word for it.

That asymmetry runs through everything below. The public record consists of docs, launch blogs, changelog posts, patents covering adjacent Deepgram ASR techniques, and partner pages, plus a smaller set of third-party evaluations. Some of those independent results are favorable. Others point at weaknesses on underrepresented languages and atypical speech, which is worth sitting with before you sign an enterprise contract.

Strategically, Nova-3 reads as a product built to satisfy four enterprise demands at once: better accuracy on noisy multi-speaker audio, lower operational latency, multilingual expansion, and self-serve customization without retraining. Across 2025 and 2026 Deepgram kept widening Nova-3's language coverage, launched Nova-3 Medical, improved multilingual code-switching, and pushed private and self-hosted deployment patterns with AWS, NVIDIA, and Fortanix. Meanwhile the company increasingly positioned Flux, not Nova-3, as its preferred model for turn-based voice agents. The likely shape of the lineup from here: Nova-3 stays the high-accuracy general ASR line while Flux absorbs the conversation-native agent features.

What Nova-3 actually is

Deepgram's documentation describes Nova-3 as its highest-performing general-purpose ASR model and draws a clean line between it and Flux. Nova-3 is the recommendation for meetings, event captioning, multi-speaker audio, multilingual and code-switching audio, noisy or far-field input, and both batch and streaming transcription. Flux is the newer model for turn-based voice-agent interaction with model-native turn detection.

At launch, Nova-3 English was available through the API for both pre-recorded and real-time streaming transcription, with multilingual and self-hosted support to follow. The launch changelog claimed a 54.3% reduction in streaming WER against competitors, yielding the 6.84% median figure, and a 47.4% reduction in batch WER for the 5.26% median. The same announcement said Nova-3 kept inference speed comparable to Nova-2 while adding Keyterm Prompting, better handling of background noise and overlapping speech, better numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.

The disclosure gap

The most important technical fact about Nova-3 is what Deepgram does not say. The official product docs contain no parameter count, no full architecture description, and no base-model training-corpus size. What is public breaks down like this:

Aspect What is public Analytical reading
Model class Proprietary end-to-end speech-to-text system; general ASR, not turn-detection-centric. High confidence.
Architecture Deepgram's prior Nova-2 writeup says the Nova family uses a Transformer-based architecture with speech-specific optimizations; Deepgram patents cover fused end-to-end ASR with transformers and knowledge-distillation methods. A third-party Together AI model page describes Nova-3 as a "latent space architecture." Reasonable inference: Nova-3 is a proprietary, transformer-heavy, end-to-end ASR system, but Deepgram has not published a first-party Nova-3 architecture paper.
Model size Not publicly disclosed by Deepgram in the reviewed Nova-3 docs. Important unknown.
Training data No public corpus-size disclosure for base Nova-3. Official materials emphasize real-world enterprise audio and challenging acoustic conditions; the retrained multilingual model credits improved curriculum and data curation; the medical variant says its evaluation uses public and proprietary customer audio. Together AI says Nova-3 used synthetic plus real-world conversational datasets, but that is not a Deepgram primary source. Public evidence supports "enterprise conversational audio plus active curation," but not a full dataset accounting.

There is a useful historical anchor in the predecessor disclosures. Deepgram's original Nova post said Nova was trained across 100+ domains and 47 billion tokens, and the Nova-2 post said Nova-2 used a two-stage curriculum over data curated from nearly 6 million resources, plus a substantial library of human transcriptions. Those numbers show Deepgram's general approach. They are not Nova-3 corpus specs, and nobody should quote them as such.

Layered lattice diagram suggesting a proprietary model with visible outer product layers and an obscured core

Features, languages, latency, deployment

The production feature set around the base engine is deep, and it is where Nova-3 earns its keep. Official docs show support for batch and streaming transcription via Deepgram's /v1/listen API and a WebSocket streaming API, with SDKs in JavaScript, Python, Go, .NET, and Java. Speaker diarization works on Nova batch models, with separate concurrency limits when diarization is enabled. Keyterm Prompting accepts up to 100 terms, though Deepgram later recommended roughly 20 to 50 as the practical range. Smart Formatting covers punctuation and paragraphs generally plus richer entity formatting for supported languages; self-hosted Nova-3 needs the separate entity-detector model for the best formatting. Redaction handles 50+ entity types and groups such as PII, PCI, PHI, and numbers. There is language detection for dominant-language identification and a language=multi mode for code-switching, plus profanity filtering on multilingual models and filler-word preservation on general Nova, Nova-2, and Nova-3 models.

The language count is one of the few places Deepgram's own messaging wobbles. The current pricing page says Nova models support 45+ languages, while some 2026 marketing pages say "50+". The safe reading is at least 45, still climbing through 2026. The rollout sequence is documented: German, Dutch, Swedish, and Danish first; then Spanish, French, and Portuguese; then Italian, Turkish, Norwegian, and Indonesian; then 12 more languages across Europe and South Asia; then Hebrew, Persian, and Urdu; then Mandarin Chinese; then Gujarati.

The latency and throughput documentation is unusually concrete for this industry. Deepgram says its streaming models are optimized for 300 ms or less transcription latency, and the Nova-3 latency guide characterizes the model as delivering sub-300 ms streaming latency under typical conditions. Rate-limit docs show Nova-3 starting at 50 concurrent pre-recorded requests, with streaming limits of 150 on pay-as-you-go in Europe and 225 in North America on Growth, rising to 200 pre-recorded and 300 streaming starting limits on Enterprise. Enabling diarization cuts those ceilings materially. In a later Deepgram and NVIDIA private-deployment post, Deepgram reported 198 ms P50 first-token latency for self-hosted Nova-3 running on NVIDIA GPUs inside an AWS VPC.

On deployment, Nova-3 is available as a managed API, as self-hosted, on-prem, or private-VPC software, and through Amazon SageMaker. It also shows up on Together AI as a dedicated-inference offering. Docs and partner pages show integrations with Twilio, LiveKit, Pipecat, Amazon Connect, Amazon Lex, and other ecosystem components.

Why Deepgram built it

The launch framing is blunt about the target: real-world enterprise ASR where legacy or generic systems break down. Contact centers, drive-thrus, healthcare terminology, multilingual organizations, noisy environments, overlapping speakers, low-latency real-time integrations. The launch blog leans on "challenging audio conditions," "real-time multilingual transcription," and "self-serve customization" without retraining.

The market rationale is also on the record. Deepgram's 2025 "State of Voice AI" report, produced with Opus Research, found that 67% of surveyed businesses viewed voice technology as foundational, 84% expected to increase voice-tech budgets, and 80% were already using some form of voice agent or IVR, yet only 21% were "very satisfied" with current systems. Deepgram sponsored that research, so read it as a company making its own case. It is still consistent with the product shift: customers wanted better ASR quality, lower latency, real-time interaction, multilingual reach, and easier deployment.

There was a second motive, and it was competitive displacement. Deepgram published migration guides from AWS Transcribe, Google Speech-to-Text, OpenAI Whisper, and AssemblyAI. Companies do not write migration guides for fun. Nova-3 was a go-to-market weapon aimed at incumbent cloud transcription products and open-source Whisper stacks, and the product story of better WER on noisy production audio, lower latency, deployment flexibility, and customization without retraining is tuned for exactly that migration conversation.

The release cadence

Nova-3 was never a single launch. The dated milestones from Deepgram's launch blog, changelog, and follow-on posts trace an expanding family: the core launch on February 12, 2025; Nova-3 Medical on March 3, 2025; Nova-3 Medical Streaming on June 4, 2025; the Nova-2 vs Nova-3 positioning piece on June 22, 2025; language support rollouts from August 2025 through April 2026; Hebrew, Persian, and Urdu support on February 12, 2026; the retrained multilingual model with major WER improvements on February 13, 2026; and the NVIDIA private-deployment results on May 27, 2026.

Who built it

Deepgram does not publish a Nova-3 author list the way OpenAI, Meta, or NVIDIA often do for research releases, so attribution is organizational. Scott Stephenson is co-founder and CEO. Adam Sypniewski is CTO and, per Deepgram's own bio, leads the research and engineering teams building the company's speech-recognition systems. Andrew Seagraves is VP of Research, and Morris Gevirtz is Head of Language. Deepgram's earlier Nova-2 materials credit its in-house model research team and DataOps team for speech-specific transformer optimization, data curation, and multi-stage training. Those teams are the relevant context for how Nova-3 was likely developed, even without a Nova-3 paper.

The public-facing release materials have named authors: Jose Nicholas Francisco on Nova-3 Medical, Hasan Jilani on Nova-3 Medical Streaming and later Nova and Flux marketing, and Martine Katz on multilingual expansion and multilingual WER improvements. The Deepgram and NVIDIA deployment post was co-authored by Conner Hughes and Michael Wang, which matters as a public signal of engineering partnership around private deployment and latency work.

The patent trail names deeper technical contributors. Deepgram patent US10540959B1 lists Jeff Ward, Adam Sypniewski, and Scott Stephenson as inventors on techniques for domain adaptation and special vocabulary handling. Other Deepgram-adjacent patent material covers fused end-to-end ASR with transformers and knowledge distillation. None of this proves Nova-3 uses each disclosed method verbatim, but it is the clearest public window into the company's technical lineage.

On external collaborators, the record points to deployment and distribution partnerships rather than co-development of the acoustic model. Partner and product pages attach Nova-3 to AWS and Amazon Connect, NVIDIA, Fortanix, OneReach.ai, Vonage, and Together AI, among others. On financing, Deepgram's Series C was led by AVP, and its Series B was led by Madrona with Alkeon and others participating. I found no evidence in the reviewed materials of Deepgram naming an outside academic lab or hyperscaler as a co-author or co-trainer of the base Nova-3 model.

Signal-flow map showing many small tributary paths converging into one strong channel, rendered as amber traces on slate-teal

Source materials and related artifacts

The table below puts official Deepgram materials first, then influential external evaluations, then the most relevant patents and code artifacts. For repositories and some patents, the indexed sources did not surface a single clean publication date, so that is noted rather than guessed.

Type Item Date Why it matters
Launch blog Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text 2025-02-12 Core launch thesis: enterprise-grade noisy-audio accuracy, first real-time multilingual transcription, and self-serve customization.
Changelog Introducing Nova-3: Most Advanced Speech-to-Text Model 2025-02-12 Most citation-worthy source for official headline metrics: 5.26% batch WER, 6.84% streaming WER, customization, redaction, and timestamp claims.
Docs Models & Languages Overview current docs Canonical current positioning of Nova-3 vs Flux; recommended use cases and model family overview.
Docs Model Options current docs Official product-level Nova-3 summary and availability information.
Docs Measuring STT Latency current docs Best primary source for latency expectations and how Deepgram wants latency measured.
Docs API Rate Limits current docs Primary source for concurrency and throughput ceilings by plan and region.
Domain variant Introducing Nova-3 Medical 2025-03-03 Best official source on domain adaptation, medical benchmarks, and keyterm behavior in a specialized Nova-3 descendant.
Domain update Nova-3 Medical Streaming 2025-06-04 Shows how Deepgram iterated Nova-3 Medical for real-time clinical workflows.
Multilingual update Nova-3 Multilingual: Major WER Improvements Across Languages 2026-02-13 Reveals training-curriculum and data-curation work, not just marketing claims.
Language expansion Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3 2026-02-12 Shows expansion into right-to-left languages and continued use of Keyterm Prompting as a differentiator.
Changelog Language support rollouts 2025-08 to 2026-04 Useful for reconstructing the rollout timeline and geographic strategy.
Comparative positioning When to Use Nova-2 vs Nova-3 2025-06-22 Product-strategy piece tying together accuracy, latency, customization, language reach, and cost.
Deployment docs Deploy Deepgram on Amazon SageMaker current docs Primary source for the AWS and VPC deployment motion.
Ecosystem note Voice Agents That Prioritize Data Security and Run Where Your Data Lives 2026-05-27 Shows self-hosted private deployment performance with NVIDIA, including 198 ms P50 first-token latency.
External benchmark "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most 2026 Independent street-name benchmark comparing Nova-3 with Whisper, Chirp, and others on a difficult lexical task.
External benchmark Voice of India 2026-05-24 Important counterweight: shows Deepgram Nova-3 struggling on some Indic languages and regions.
External benchmark Zero-Shot Recognition of Dysarthric Speech 2025-12-19 Useful for limitations analysis: all systems, including Nova-3, degrade sharply on severe dysarthric speech.
Patent US10540959B1: Augmented generalized deep learning with special vocabulary issue date not surfaced in retrieved snippet Relevant to domain vocabulary adaptation, a central Nova-3 product theme.
Patent US12380880B2: End-to-end automatic speech recognition with transformer issue date not surfaced in retrieved snippet Relevant to Deepgram's likely transformer-fused end-to-end ASR lineage.
Patent US11410029B2: Soft label generation for knowledge distillation issue date not surfaced in retrieved snippet Relevant to training efficiency and model transfer techniques in production ASR.
Code/org Deepgram GitHub organization ongoing Public repos are SDKs and specs, not open Nova-3 weights.
Tooling repo Deepgram support-toolkit ongoing Includes the latency-measurement tools Deepgram itself recommends.
SDKs JS / Python / Go / .NET / Java SDKs ongoing Operationally important for integrating Nova-3 into production systems.

Compare this against Whisper, SeamlessM4T, MMS, or Parakeet and the pattern jumps out: Nova-3 is richly documented as a product but thinly documented as a scientific model. That is the single biggest source asymmetry in this research set.

The competitive landscape

Treat any direct "winner" claim across speech models with suspicion. Most public numbers were not measured on the same datasets, audio conditions, or output-normalization rules. Nova-3's headline WER comes from Deepgram-authored evaluations; open models like Whisper and SeamlessM4T are documented through papers and model cards; cloud vendors like Google and Azure publish capabilities more readily than apples-to-apples accuracy numbers. The useful comparison is product shape, deployment model, feature completeness, and where the public evidence is strongest.

System Public size Languages Real-time use Diarization and production features Deployment Public pricing signal What stands out against Nova-3
Deepgram Nova-3 Undisclosed 45+ officially on pricing page; language count still expanding Yes; sub-300 ms target. Keyterm Prompting, redaction, smart formatting, multilingual code-switching, diarization. API, self-hosted, VPC/on-prem, SageMaker, Together AI. Pay-as-you-go pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming; multilingual higher. Strongest public case is enterprise noisy-production ASR plus deployment flexibility.
OpenAI Whisper large-v3 1.55B Multilingual; tokenizer covers 99 languages. Not natively productized for streaming in the open-source release; wrappers exist. turbo speeds inference with 4 decoder layers. No native diarization in the core model; usually paired with external tools. Open source / self-hosted; API access history via OpenAI and others. Open-source cost is infra-dependent. Most transparent baseline; strongest for openness and ecosystem, weaker than Nova-3 as a turnkey enterprise stack.
OpenAI GPT-Realtime-Whisper / next-gen audio models Not publicly parameterized Multilingual STT product line. Yes, explicitly streaming. Productized realtime STT; official claim is better WER than Whisper v2/v3. Managed API. $0.017/min for GPT-Realtime-Whisper. More "LLM-audio" oriented; materially pricier than Nova-3's listed STT rates.
Google Chirp 2 / Chirp 3 Chirp foundation model publicly described as 2B; millions of hours of audio. 100+ languages for Chirp foundation; current STT docs show multilingual auto language detection and diarization. Yes. Timestamps, profanity filtering, auto language detection, diarization. Cloud and on-prem offerings. Google's v2 launch blog set pricing at $0.016/min, with volume tiers as low as $0.004/min. Likely the strongest hyperscaler multilingual rival on paper; public benchmark transparency still uneven.
Microsoft Azure Speech Undisclosed 140+ locales / supported inputs. Real-time, fast, and batch transcription. Real-time diarization, language identification, custom speech. Managed cloud plus containers and enterprise deployment options in the Azure ecosystem. Region-specific pricing page; billed per second. Enterprise breadth and customization are strong; public "flagship WER" transparency is weaker than Nova-3's marketing.
NVIDIA Parakeet 1.1B / Riva / NIM 1.1B for Parakeet 1.1b RNNT Multilingual. 25+ languages for Parakeet RNNT multilingual; more via other NIM variants. Yes, streaming + offline. Auto punctuation and capitalization; streaming diarization via Sortformer for Parakeet and Conformer families. Strong self-hosted, GPU-native path. Hardware and licensing dependent. Best open-ish self-hosted rival for GPU-first teams; less turnkey SaaS than Nova-3.
Speechmatics Ursa 2 / STT API Undisclosed 55+ / 56+ languages. Yes; Speechmatics advertises sub-second, speaker-aware STT. Realtime diarization, multilingual support, batch + streaming. API and on-device offerings. Pricing page shows Pro from $0.24/hr and 50 concurrent real-time sessions. Especially credible on multilingual realtime diarization; fewer public benchmark details than one would like.
AssemblyAI Universal Undisclosed 99 languages; diarization for 95. Yes. Universal-3 Pro Streaming adds prompting and real-time diarization. Language detection, formatting, filler words, keyterms, timestamps, diarization. Managed API. Universal supports 99 languages at $0.27/hr flat; U3 Pro Streaming is $0.45/hr base. Very aggressive price and language story; Nova-3 tends to differentiate on self-hosting and enterprise deployment flexibility.
Meta SeamlessM4T v2 / SeamlessStreaming / MMS SeamlessM4T v2 uses UnitY2; MMS covers 1,107 STT languages. 101 speech-input languages for SeamlessM4T; 96 for streaming ASR; 1,107 for MMS STT. Research-grade streaming exists via SeamlessStreaming, around 2 seconds latency. Strong for multilingual research and speech translation; not a turnkey commercial STT stack with built-in diarization and enterprise extras. Open research code and models. Infra-dependent. Meta wins on open multilingual breadth, especially translation and language coverage, but not on turnkey enterprise-product completeness.

How to read that table

Nova-3's strongest competitive position is not that it is the most transparent model, the largest, or the cheapest open option. It is that Nova-3 combines strong published ASR performance, low-latency streaming, runtime vocabulary control, speaker-aware and compliance features, and deployment flexibility in one commercially supported stack. That bundle is what enterprise teams tend to need, and it is why Deepgram's comparison and migration materials keep hammering on migration from AWS, Google, Whisper, and AssemblyAI.

Where competitors clearly beat Nova-3 is scientific openness, and sometimes language breadth. Whisper publishes architecture, sizes, a model card, and training-scale details. Meta's MMS dwarfs everyone on language count. Google publishes a 2B-parameter Chirp foundation model description and broad language support. NVIDIA publishes more concrete architecture detail for Parakeet than Deepgram does for Nova-3. If your primary decision criterion is reproducibility or open weights, Nova-3 was simply not designed for you.

The accuracy story is mixed, which is what anyone who has run ASR evaluations should expect. Deepgram's own enterprise-style benchmark strongly favors Nova-3. Independent evaluations complicate that picture: the Voice of India benchmark places Nova-3 in a weaker tier on several underrepresented Indic languages, and the dysarthria benchmark shows severe degradation for every tested system, Nova-3 included. Nova-3 looks strongest on the production-audio slices Deepgram optimized for. It is not obviously dominant on every multilingual or accessibility-heavy frontier.

Grid of varied waveform tiles where most render crisply and a few fade into grain, suggesting uneven accuracy across languages

Adoption, pricing, and licensing

Deepgram's customer material shows Nova-3 landing in regulated, latency-sensitive, and multilingual production settings. Prem AI says it selected self-hosted Nova-3 Base as its primary STT engine for sovereign voice workloads, citing strong English and EU-language accuracy, better diarization than a self-hosted Whisper plus pyannote stack, and streaming performance that met sub-300 to 500 ms turn-level latency goals. Gradient Labs reports a noticeable quality improvement after introducing Nova-3. SigmaMind AI says Nova-3 and Flux cut end-to-end agent latency by roughly 300 ms at scale. All three are vendor-published case studies, so weigh them as marketing evidence, but they map the target adoption pattern clearly enough.

On price, Nova-3 sits in a premium-but-cheap position. The current pricing page shows Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming on pay-as-you-go, with lower rates on Growth; Nova-3 Multilingual is listed above that. Add-ons such as redaction and Keyterm Prompting are priced separately, and concurrency ceilings are tied to plan level. Nova-3 is not the absolute cheapest STT in every scenario, but it is competitively priced for a managed model that also offers self-hosting, VPC, and regulated-environment deployment.

Licensing is straightforward: Nova-3 is proprietary commercial software, not an open model. Deepgram exposes it through a paid API, private and self-hosted deployments, AWS Marketplace and SageMaker-style distribution, and partner channels such as Together AI. The public GitHub organization is SDKs and API specifications, not downloadable weights, and I found no public open-source model license or weight release for Nova-3 in the reviewed materials.

Where it falls short

Opacity is the headline limitation. Deepgram publishes enough to buy and deploy Nova-3, but not enough to scrutinize it as a research artifact: no first-party paper, no parameter count, no full model card, no training-corpus accounting in the reviewed materials. Engineering buyers may not care. Researchers, regulated public-sector procurement, and anyone who needs reproducibility should.

Benchmark asymmetry is the second problem. Most of Nova-3's headline quality claims are vendor-generated, which does not make them false, but does mean they are optimized around Deepgram's chosen datasets, normalization rules, and product framing. The independent evidence shows the usual ASR pattern of performance varying sharply by language, accent, domain, and speech pathology. On Voice of India, Nova-3 shows elevated error rates on some languages such as Tamil and Odia. On dysarthric speech, every tested system, Nova-3 included, degrades badly as severity rises.

The third issue is feature bifurcation inside Deepgram's own lineup. Nova-3 remains the accuracy-first general ASR, but Flux is increasingly the conversation-native model with integrated turn detection and lower end-of-turn delay. Sensible product strategy, but it means Nova-3 is no longer the unambiguous answer for every voice AI use case. If the workload is a turn-based agent, Deepgram itself now steers developers toward Flux.

Smaller but practical gotchas: Keyterm Prompting is capped at 100 terms, with 20 to 50 recommended for reliability, because stuffing the list raises the risk of force-fitting terms into transcripts. Language-count messaging is inconsistent between the pricing page and marketing. And enabling diarization lowers concurrency ceilings versus plain STT, which bites real-time high-volume systems.

Where the line goes next

Reading the public release behavior rather than speculating, five threads look likely to continue. Language expansion is the obvious one, since nearly every Nova-3 update across late 2025 and 2026 has done exactly that. Multilingual and code-switching quality should keep improving too; the retrained multilingual release specifically credits curriculum and data-curation changes in that area.

Vertical specialization is already in motion, with healthcare as the proof point. Nova-3 Medical and its later streaming and batch upgrades are how a company tests a base-model-plus-domain-model strategy, and similar moves for legal, finance, or public sector would fit the pattern. Private deployment and sovereignty options are deepening as well: the SageMaker path, the NVIDIA and Fortanix joint story, and the Prem AI case study all point the same direction, toward customers who want Nova-class ASR inside their own cloud boundary or on-prem footprint. And the product architecture is settling into an explicit split where Nova-3 stays the premium general ASR line and Flux becomes the preferred agent-interaction line, which the current docs and migration guides already imply.

Open questions

A few things remain genuinely unanswerable from public sources. Nova-3's exact parameter count, core architecture, and base training-data scale are undisclosed. There is still no single neutral benchmark that scores Nova-3 against Google, Azure, Speechmatics, and AssemblyAI on multilingual, noisy, code-switching audio with latency, diarization, and formatting under the same rules. And nothing public indicates whether Deepgram will ever publish a true Nova-3 technical paper, or keep documenting the line as a commercial product and nothing more.

Sources

  1. February 12, 2025 changelog: https://developers.deepgram.com/changelog/2025/2/12
  2. Models & Languages Overview: https://developers.deepgram.com/docs/models-languages-overview
  3. Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text: https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
  4. Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API: https://deepgram.com/learn/nova-2-speech-to-text-api
  5. Introducing Nova: World's Most Powerful Speech-to-Text API: https://deepgram.com/learn/nova-speech-to-text-whisper-api
  6. Deepgram API Overview: https://developers.deepgram.com/reference/deepgram-api-overview
  7. Getting Started, live streaming audio: https://developers.deepgram.com/docs/live-streaming-audio
  8. Speaker Diarization: https://developers.deepgram.com/docs/diarization
  9. Smart Formatting: https://developers.deepgram.com/docs/smart-format
  10. Supported Entity Types: https://developers.deepgram.com/docs/supported-entity-types
  11. Language Detection: https://developers.deepgram.com/docs/language-detection
  12. Profanity Filtering: https://developers.deepgram.com/docs/profanity-filter
  13. Deepgram Pricing: https://deepgram.com/pricing
  14. Measuring STT Latency: https://developers.deepgram.com/docs/measuring-streaming-latency
  15. Deploy Deepgram on Amazon SageMaker: https://developers.deepgram.com/docs/deploy-amazon-sagemaker
  16. Introducing "State of Voice AI 2025": https://deepgram.com/learn/state-of-voice-ai-2025
  17. Migrating From Google Speech-to-Text (STT) to Deepgram: https://developers.deepgram.com/docs/migrating-from-google-speech-to-text-stt-to-deepgram
  18. Meet our leadership team: https://deepgram.com/company/leadership
  19. Introducing Nova-3 Medical: https://deepgram.com/learn/introducing-nova-3-medical-speech-to-text-api
  20. US10540959B1, Augmented generalized deep learning with special vocabulary: https://patents.google.com/patent/US10540959B1/en
  21. Deepgram Raises $130M Series C at $1.3B Valuation: https://deepgram.com/learn/press-release-deepgram-raises-series-c
  22. Model Options: https://developers.deepgram.com/docs/model
  23. API Rate Limits: https://developers.deepgram.com/reference/api-rate-limits
  24. Nova-3 Medical Streaming: https://deepgram.com/learn/nova-3-medical-streaming-update
  25. Nova-3 Multilingual: Major WER Improvements Across Languages: https://deepgram.com/learn/nova-3-multilingual-major-wer-improvements-across-languages
  26. Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3: https://deepgram.com/learn/speech-to-text-for-hebrew-persian-urdu-on-nova-3
  27. August 15, 2025 changelog: https://developers.deepgram.com/changelog/2025/8/15
  28. When to Use Nova-2 vs Nova-3 (for Devs): https://deepgram.com/learn/model-comparison-when-to-use-nova-2-vs-nova-3-for-devs
  29. Voice Agents That Prioritize Data Security and Run Where Your Data Lives: https://deepgram.com/learn/voice-agents-deepgram-nvidia-nemotron
  30. "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most: https://arxiv.org/html/2602.12249v2
  31. Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India: https://arxiv.org/html/2604.19151v2
  32. Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models: https://arxiv.org/abs/2512.17474
  33. End-to-end automatic speech recognition with transformer (US12380880B2): https://patents.google.com/patent/US12380880B2/en
  34. Soft label generation for knowledge distillation (US11410029B2): https://patents.google.com/patent/US11410029B2/en
  35. Deepgram GitHub organization: https://github.com/deepgram
  36. Whisper model card: https://github.com/openai/whisper/blob/main/model-card.md
  37. turbo model release, openai/whisper discussion #2363: https://github.com/openai/whisper/discussions/2363
  38. openai/whisper repository: https://github.com/openai/whisper
  39. Introducing next-generation audio models in the API: https://openai.com/index/introducing-our-next-generation-audio-models/
  40. OpenAI API Pricing: https://openai.com/api/pricing/
  41. Google Cloud launches new AI models, opens Generative AI Studio: https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-models-opens-generative-ai-studio
  42. Compare transcription models, Cloud Speech-to-Text: https://docs.cloud.google.com/speech-to-text/docs/transcription-model
  43. Chirp 2: Enhanced multilingual accuracy: https://docs.cloud.google.com/speech-to-text/docs/models/chirp-2
  44. Cloud Speech-to-Text On-Prem Pricing: https://cloud.google.com/speech-to-text/priv/pricing
  45. Google Cloud Speech-to-Text V2 API: https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
  46. Language and Voice Support for Azure Speech: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
  47. Speech to Text Overview, Azure Speech Service: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
  48. Real-time diarization quickstart, Azure Speech service: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization
  49. Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
  50. NVIDIA ASR NIM Support Matrix: https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/asr.html
  51. About NVIDIA ASR NIM Microservice: https://docs.nvidia.com/nim/speech/latest/asr/index.html
  52. Speechmatics, AI Speech Technology: https://www.speechmatics.com/
  53. Speechmatics realtime diarization: https://docs.speechmatics.com/speech-to-text/realtime/realtime-diarization
  54. Speechmatics on-device speech-to-text: https://www.speechmatics.com/speech-to-text/on-device
  55. Speechmatics pricing: https://www.speechmatics.com/pricing
  56. AssemblyAI: 99 Languages, Advanced Features, One Price: https://www.assemblyai.com/blog/99-languages
  57. Introducing Universal-3 Pro: https://www.assemblyai.com/blog/introducing-universal-3-pro
  58. AssemblyAI pricing: https://www.assemblyai.com/pricing
  59. facebookresearch/seamless_communication: https://github.com/facebookresearch/seamless_communication
  60. SeamlessM4T README: https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/README.md
  61. Seamless Communication, AI at Meta: https://ai.meta.com/research/seamless-communication/
  62. Prem AI Brings Sovereign Voice AI to Regulated Enterprises: https://deepgram.com/customers/prem-ai
  63. Large Vocabulary Speech Recognition: A Practical Guide: https://deepgram.com/learn/large-vocabulary-speech-recognition
The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono