Deepgram Nova-3: the enterprise ASR workhorse you can buy but not inspect
A practitioner's breakdown of Deepgram Nova-3: WER claims, sub-300 ms streaming latency, pricing, languages, deployment options, and where it falls short.

Nova-3 is Deepgram's flagship general-purpose speech-to-text family for batch and streaming transcription, and the company is not shy about the numbers: a 5.26% median batch WER and 6.84% median streaming WER on Deepgram's own benchmark suite, with sub-300 ms transcription latency targets for streaming workloads. It ships with the kind of controls production teams actually ask for, including Keyterm Prompting, redaction, smart formatting, diarization, multilingual code-switching, and self-hosted deployment. The catch, and it is a real one, is that almost everything we know about Nova-3 comes from Deepgram itself. There is no architecture paper, no parameter count, no training-corpus disclosure. You can evaluate Nova-3 thoroughly as a product. As a scientific artifact, you mostly have to take the vendor's word for it.
That asymmetry runs through everything below. The public record consists of docs, launch blogs, changelog posts, patents covering adjacent Deepgram ASR techniques, and partner pages, plus a smaller set of third-party evaluations. Some of those independent results are favorable. Others point at weaknesses on underrepresented languages and atypical speech, which is worth sitting with before you sign an enterprise contract.
Strategically, Nova-3 reads as a product built to satisfy four enterprise demands at once: better accuracy on noisy multi-speaker audio, lower operational latency, multilingual expansion, and self-serve customization without retraining. Across 2025 and 2026 Deepgram kept widening Nova-3's language coverage, launched Nova-3 Medical, improved multilingual code-switching, and pushed private and self-hosted deployment patterns with AWS, NVIDIA, and Fortanix. Meanwhile the company increasingly positioned Flux, not Nova-3, as its preferred model for turn-based voice agents. The likely shape of the lineup from here: Nova-3 stays the high-accuracy general ASR line while Flux absorbs the conversation-native agent features.
What Nova-3 actually is
Deepgram's documentation describes Nova-3 as its highest-performing general-purpose ASR model and draws a clean line between it and Flux. Nova-3 is the recommendation for meetings, event captioning, multi-speaker audio, multilingual and code-switching audio, noisy or far-field input, and both batch and streaming transcription. Flux is the newer model for turn-based voice-agent interaction with model-native turn detection.
At launch, Nova-3 English was available through the API for both pre-recorded and real-time streaming transcription, with multilingual and self-hosted support to follow. The launch changelog claimed a 54.3% reduction in streaming WER against competitors, yielding the 6.84% median figure, and a 47.4% reduction in batch WER for the 5.26% median. The same announcement said Nova-3 kept inference speed comparable to Nova-2 while adding Keyterm Prompting, better handling of background noise and overlapping speech, better numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.
The disclosure gap
The most important technical fact about Nova-3 is what Deepgram does not say. The official product docs contain no parameter count, no full architecture description, and no base-model training-corpus size. What is public breaks down like this:
| Aspect | What is public | Analytical reading |
|---|---|---|
| Model class | Proprietary end-to-end speech-to-text system; general ASR, not turn-detection-centric. | High confidence. |
| Architecture | Deepgram's prior Nova-2 writeup says the Nova family uses a Transformer-based architecture with speech-specific optimizations; Deepgram patents cover fused end-to-end ASR with transformers and knowledge-distillation methods. A third-party Together AI model page describes Nova-3 as a "latent space architecture." | Reasonable inference: Nova-3 is a proprietary, transformer-heavy, end-to-end ASR system, but Deepgram has not published a first-party Nova-3 architecture paper. |
| Model size | Not publicly disclosed by Deepgram in the reviewed Nova-3 docs. | Important unknown. |
| Training data | No public corpus-size disclosure for base Nova-3. Official materials emphasize real-world enterprise audio and challenging acoustic conditions; the retrained multilingual model credits improved curriculum and data curation; the medical variant says its evaluation uses public and proprietary customer audio. Together AI says Nova-3 used synthetic plus real-world conversational datasets, but that is not a Deepgram primary source. | Public evidence supports "enterprise conversational audio plus active curation," but not a full dataset accounting. |
There is a useful historical anchor in the predecessor disclosures. Deepgram's original Nova post said Nova was trained across 100+ domains and 47 billion tokens, and the Nova-2 post said Nova-2 used a two-stage curriculum over data curated from nearly 6 million resources, plus a substantial library of human transcriptions. Those numbers show Deepgram's general approach. They are not Nova-3 corpus specs, and nobody should quote them as such.

Features, languages, latency, deployment
The production feature set around the base engine is deep, and it is where Nova-3 earns its keep. Official docs show support for batch and streaming transcription via Deepgram's /v1/listen API and a WebSocket streaming API, with SDKs in JavaScript, Python, Go, .NET, and Java. Speaker diarization works on Nova batch models, with separate concurrency limits when diarization is enabled. Keyterm Prompting accepts up to 100 terms, though Deepgram later recommended roughly 20 to 50 as the practical range. Smart Formatting covers punctuation and paragraphs generally plus richer entity formatting for supported languages; self-hosted Nova-3 needs the separate entity-detector model for the best formatting. Redaction handles 50+ entity types and groups such as PII, PCI, PHI, and numbers. There is language detection for dominant-language identification and a language=multi mode for code-switching, plus profanity filtering on multilingual models and filler-word preservation on general Nova, Nova-2, and Nova-3 models.
The language count is one of the few places Deepgram's own messaging wobbles. The current pricing page says Nova models support 45+ languages, while some 2026 marketing pages say "50+". The safe reading is at least 45, still climbing through 2026. The rollout sequence is documented: German, Dutch, Swedish, and Danish first; then Spanish, French, and Portuguese; then Italian, Turkish, Norwegian, and Indonesian; then 12 more languages across Europe and South Asia; then Hebrew, Persian, and Urdu; then Mandarin Chinese; then Gujarati.
The latency and throughput documentation is unusually concrete for this industry. Deepgram says its streaming models are optimized for 300 ms or less transcription latency, and the Nova-3 latency guide characterizes the model as delivering sub-300 ms streaming latency under typical conditions. Rate-limit docs show Nova-3 starting at 50 concurrent pre-recorded requests, with streaming limits of 150 on pay-as-you-go in Europe and 225 in North America on Growth, rising to 200 pre-recorded and 300 streaming starting limits on Enterprise. Enabling diarization cuts those ceilings materially. In a later Deepgram and NVIDIA private-deployment post, Deepgram reported 198 ms P50 first-token latency for self-hosted Nova-3 running on NVIDIA GPUs inside an AWS VPC.
On deployment, Nova-3 is available as a managed API, as self-hosted, on-prem, or private-VPC software, and through Amazon SageMaker. It also shows up on Together AI as a dedicated-inference offering. Docs and partner pages show integrations with Twilio, LiveKit, Pipecat, Amazon Connect, Amazon Lex, and other ecosystem components.
Why Deepgram built it
The launch framing is blunt about the target: real-world enterprise ASR where legacy or generic systems break down. Contact centers, drive-thrus, healthcare terminology, multilingual organizations, noisy environments, overlapping speakers, low-latency real-time integrations. The launch blog leans on "challenging audio conditions," "real-time multilingual transcription," and "self-serve customization" without retraining.
The market rationale is also on the record. Deepgram's 2025 "State of Voice AI" report, produced with Opus Research, found that 67% of surveyed businesses viewed voice technology as foundational, 84% expected to increase voice-tech budgets, and 80% were already using some form of voice agent or IVR, yet only 21% were "very satisfied" with current systems. Deepgram sponsored that research, so read it as a company making its own case. It is still consistent with the product shift: customers wanted better ASR quality, lower latency, real-time interaction, multilingual reach, and easier deployment.
There was a second motive, and it was competitive displacement. Deepgram published migration guides from AWS Transcribe, Google Speech-to-Text, OpenAI Whisper, and AssemblyAI. Companies do not write migration guides for fun. Nova-3 was a go-to-market weapon aimed at incumbent cloud transcription products and open-source Whisper stacks, and the product story of better WER on noisy production audio, lower latency, deployment flexibility, and customization without retraining is tuned for exactly that migration conversation.
The release cadence
Nova-3 was never a single launch. The dated milestones from Deepgram's launch blog, changelog, and follow-on posts trace an expanding family: the core launch on February 12, 2025; Nova-3 Medical on March 3, 2025; Nova-3 Medical Streaming on June 4, 2025; the Nova-2 vs Nova-3 positioning piece on June 22, 2025; language support rollouts from August 2025 through April 2026; Hebrew, Persian, and Urdu support on February 12, 2026; the retrained multilingual model with major WER improvements on February 13, 2026; and the NVIDIA private-deployment results on May 27, 2026.
Who built it
Deepgram does not publish a Nova-3 author list the way OpenAI, Meta, or NVIDIA often do for research releases, so attribution is organizational. Scott Stephenson is co-founder and CEO. Adam Sypniewski is CTO and, per Deepgram's own bio, leads the research and engineering teams building the company's speech-recognition systems. Andrew Seagraves is VP of Research, and Morris Gevirtz is Head of Language. Deepgram's earlier Nova-2 materials credit its in-house model research team and DataOps team for speech-specific transformer optimization, data curation, and multi-stage training. Those teams are the relevant context for how Nova-3 was likely developed, even without a Nova-3 paper.
The public-facing release materials have named authors: Jose Nicholas Francisco on Nova-3 Medical, Hasan Jilani on Nova-3 Medical Streaming and later Nova and Flux marketing, and Martine Katz on multilingual expansion and multilingual WER improvements. The Deepgram and NVIDIA deployment post was co-authored by Conner Hughes and Michael Wang, which matters as a public signal of engineering partnership around private deployment and latency work.
The patent trail names deeper technical contributors. Deepgram patent US10540959B1 lists Jeff Ward, Adam Sypniewski, and Scott Stephenson as inventors on techniques for domain adaptation and special vocabulary handling. Other Deepgram-adjacent patent material covers fused end-to-end ASR with transformers and knowledge distillation. None of this proves Nova-3 uses each disclosed method verbatim, but it is the clearest public window into the company's technical lineage.
On external collaborators, the record points to deployment and distribution partnerships rather than co-development of the acoustic model. Partner and product pages attach Nova-3 to AWS and Amazon Connect, NVIDIA, Fortanix, OneReach.ai, Vonage, and Together AI, among others. On financing, Deepgram's Series C was led by AVP, and its Series B was led by Madrona with Alkeon and others participating. I found no evidence in the reviewed materials of Deepgram naming an outside academic lab or hyperscaler as a co-author or co-trainer of the base Nova-3 model.

Source materials and related artifacts
The table below puts official Deepgram materials first, then influential external evaluations, then the most relevant patents and code artifacts. For repositories and some patents, the indexed sources did not surface a single clean publication date, so that is noted rather than guessed.
| Type | Item | Date | Why it matters |
|---|---|---|---|
| Launch blog | Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text | 2025-02-12 | Core launch thesis: enterprise-grade noisy-audio accuracy, first real-time multilingual transcription, and self-serve customization. |
| Changelog | Introducing Nova-3: Most Advanced Speech-to-Text Model | 2025-02-12 | Most citation-worthy source for official headline metrics: 5.26% batch WER, 6.84% streaming WER, customization, redaction, and timestamp claims. |
| Docs | Models & Languages Overview | current docs | Canonical current positioning of Nova-3 vs Flux; recommended use cases and model family overview. |
| Docs | Model Options | current docs | Official product-level Nova-3 summary and availability information. |
| Docs | Measuring STT Latency | current docs | Best primary source for latency expectations and how Deepgram wants latency measured. |
| Docs | API Rate Limits | current docs | Primary source for concurrency and throughput ceilings by plan and region. |
| Domain variant | Introducing Nova-3 Medical | 2025-03-03 | Best official source on domain adaptation, medical benchmarks, and keyterm behavior in a specialized Nova-3 descendant. |
| Domain update | Nova-3 Medical Streaming | 2025-06-04 | Shows how Deepgram iterated Nova-3 Medical for real-time clinical workflows. |
| Multilingual update | Nova-3 Multilingual: Major WER Improvements Across Languages | 2026-02-13 | Reveals training-curriculum and data-curation work, not just marketing claims. |
| Language expansion | Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3 | 2026-02-12 | Shows expansion into right-to-left languages and continued use of Keyterm Prompting as a differentiator. |
| Changelog | Language support rollouts | 2025-08 to 2026-04 | Useful for reconstructing the rollout timeline and geographic strategy. |
| Comparative positioning | When to Use Nova-2 vs Nova-3 | 2025-06-22 | Product-strategy piece tying together accuracy, latency, customization, language reach, and cost. |
| Deployment docs | Deploy Deepgram on Amazon SageMaker | current docs | Primary source for the AWS and VPC deployment motion. |
| Ecosystem note | Voice Agents That Prioritize Data Security and Run Where Your Data Lives | 2026-05-27 | Shows self-hosted private deployment performance with NVIDIA, including 198 ms P50 first-token latency. |
| External benchmark | "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most | 2026 | Independent street-name benchmark comparing Nova-3 with Whisper, Chirp, and others on a difficult lexical task. |
| External benchmark | Voice of India | 2026-05-24 | Important counterweight: shows Deepgram Nova-3 struggling on some Indic languages and regions. |
| External benchmark | Zero-Shot Recognition of Dysarthric Speech | 2025-12-19 | Useful for limitations analysis: all systems, including Nova-3, degrade sharply on severe dysarthric speech. |
| Patent | US10540959B1: Augmented generalized deep learning with special vocabulary | issue date not surfaced in retrieved snippet | Relevant to domain vocabulary adaptation, a central Nova-3 product theme. |
| Patent | US12380880B2: End-to-end automatic speech recognition with transformer | issue date not surfaced in retrieved snippet | Relevant to Deepgram's likely transformer-fused end-to-end ASR lineage. |
| Patent | US11410029B2: Soft label generation for knowledge distillation | issue date not surfaced in retrieved snippet | Relevant to training efficiency and model transfer techniques in production ASR. |
| Code/org | Deepgram GitHub organization | ongoing | Public repos are SDKs and specs, not open Nova-3 weights. |
| Tooling repo | Deepgram support-toolkit | ongoing | Includes the latency-measurement tools Deepgram itself recommends. |
| SDKs | JS / Python / Go / .NET / Java SDKs | ongoing | Operationally important for integrating Nova-3 into production systems. |
Compare this against Whisper, SeamlessM4T, MMS, or Parakeet and the pattern jumps out: Nova-3 is richly documented as a product but thinly documented as a scientific model. That is the single biggest source asymmetry in this research set.
The competitive landscape
Treat any direct "winner" claim across speech models with suspicion. Most public numbers were not measured on the same datasets, audio conditions, or output-normalization rules. Nova-3's headline WER comes from Deepgram-authored evaluations; open models like Whisper and SeamlessM4T are documented through papers and model cards; cloud vendors like Google and Azure publish capabilities more readily than apples-to-apples accuracy numbers. The useful comparison is product shape, deployment model, feature completeness, and where the public evidence is strongest.
| System | Public size | Languages | Real-time use | Diarization and production features | Deployment | Public pricing signal | What stands out against Nova-3 |
|---|---|---|---|---|---|---|---|
| Deepgram Nova-3 | Undisclosed | 45+ officially on pricing page; language count still expanding | Yes; sub-300 ms target. | Keyterm Prompting, redaction, smart formatting, multilingual code-switching, diarization. | API, self-hosted, VPC/on-prem, SageMaker, Together AI. | Pay-as-you-go pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming; multilingual higher. | Strongest public case is enterprise noisy-production ASR plus deployment flexibility. |
| OpenAI Whisper large-v3 | 1.55B | Multilingual; tokenizer covers 99 languages. | Not natively productized for streaming in the open-source release; wrappers exist. turbo speeds inference with 4 decoder layers. | No native diarization in the core model; usually paired with external tools. | Open source / self-hosted; API access history via OpenAI and others. | Open-source cost is infra-dependent. | Most transparent baseline; strongest for openness and ecosystem, weaker than Nova-3 as a turnkey enterprise stack. |
| OpenAI GPT-Realtime-Whisper / next-gen audio models | Not publicly parameterized | Multilingual STT product line. | Yes, explicitly streaming. | Productized realtime STT; official claim is better WER than Whisper v2/v3. | Managed API. | $0.017/min for GPT-Realtime-Whisper. | More "LLM-audio" oriented; materially pricier than Nova-3's listed STT rates. |
| Google Chirp 2 / Chirp 3 | Chirp foundation model publicly described as 2B; millions of hours of audio. | 100+ languages for Chirp foundation; current STT docs show multilingual auto language detection and diarization. | Yes. | Timestamps, profanity filtering, auto language detection, diarization. | Cloud and on-prem offerings. | Google's v2 launch blog set pricing at $0.016/min, with volume tiers as low as $0.004/min. | Likely the strongest hyperscaler multilingual rival on paper; public benchmark transparency still uneven. |
| Microsoft Azure Speech | Undisclosed | 140+ locales / supported inputs. | Real-time, fast, and batch transcription. | Real-time diarization, language identification, custom speech. | Managed cloud plus containers and enterprise deployment options in the Azure ecosystem. | Region-specific pricing page; billed per second. | Enterprise breadth and customization are strong; public "flagship WER" transparency is weaker than Nova-3's marketing. |
| NVIDIA Parakeet 1.1B / Riva / NIM | 1.1B for Parakeet 1.1b RNNT Multilingual. | 25+ languages for Parakeet RNNT multilingual; more via other NIM variants. | Yes, streaming + offline. | Auto punctuation and capitalization; streaming diarization via Sortformer for Parakeet and Conformer families. | Strong self-hosted, GPU-native path. | Hardware and licensing dependent. | Best open-ish self-hosted rival for GPU-first teams; less turnkey SaaS than Nova-3. |
| Speechmatics Ursa 2 / STT API | Undisclosed | 55+ / 56+ languages. | Yes; Speechmatics advertises sub-second, speaker-aware STT. | Realtime diarization, multilingual support, batch + streaming. | API and on-device offerings. | Pricing page shows Pro from $0.24/hr and 50 concurrent real-time sessions. | Especially credible on multilingual realtime diarization; fewer public benchmark details than one would like. |
| AssemblyAI Universal | Undisclosed | 99 languages; diarization for 95. | Yes. Universal-3 Pro Streaming adds prompting and real-time diarization. | Language detection, formatting, filler words, keyterms, timestamps, diarization. | Managed API. | Universal supports 99 languages at $0.27/hr flat; U3 Pro Streaming is $0.45/hr base. | Very aggressive price and language story; Nova-3 tends to differentiate on self-hosting and enterprise deployment flexibility. |
| Meta SeamlessM4T v2 / SeamlessStreaming / MMS | SeamlessM4T v2 uses UnitY2; MMS covers 1,107 STT languages. | 101 speech-input languages for SeamlessM4T; 96 for streaming ASR; 1,107 for MMS STT. | Research-grade streaming exists via SeamlessStreaming, around 2 seconds latency. | Strong for multilingual research and speech translation; not a turnkey commercial STT stack with built-in diarization and enterprise extras. | Open research code and models. | Infra-dependent. | Meta wins on open multilingual breadth, especially translation and language coverage, but not on turnkey enterprise-product completeness. |
How to read that table
Nova-3's strongest competitive position is not that it is the most transparent model, the largest, or the cheapest open option. It is that Nova-3 combines strong published ASR performance, low-latency streaming, runtime vocabulary control, speaker-aware and compliance features, and deployment flexibility in one commercially supported stack. That bundle is what enterprise teams tend to need, and it is why Deepgram's comparison and migration materials keep hammering on migration from AWS, Google, Whisper, and AssemblyAI.
Where competitors clearly beat Nova-3 is scientific openness, and sometimes language breadth. Whisper publishes architecture, sizes, a model card, and training-scale details. Meta's MMS dwarfs everyone on language count. Google publishes a 2B-parameter Chirp foundation model description and broad language support. NVIDIA publishes more concrete architecture detail for Parakeet than Deepgram does for Nova-3. If your primary decision criterion is reproducibility or open weights, Nova-3 was simply not designed for you.
The accuracy story is mixed, which is what anyone who has run ASR evaluations should expect. Deepgram's own enterprise-style benchmark strongly favors Nova-3. Independent evaluations complicate that picture: the Voice of India benchmark places Nova-3 in a weaker tier on several underrepresented Indic languages, and the dysarthria benchmark shows severe degradation for every tested system, Nova-3 included. Nova-3 looks strongest on the production-audio slices Deepgram optimized for. It is not obviously dominant on every multilingual or accessibility-heavy frontier.

Adoption, pricing, and licensing
Deepgram's customer material shows Nova-3 landing in regulated, latency-sensitive, and multilingual production settings. Prem AI says it selected self-hosted Nova-3 Base as its primary STT engine for sovereign voice workloads, citing strong English and EU-language accuracy, better diarization than a self-hosted Whisper plus pyannote stack, and streaming performance that met sub-300 to 500 ms turn-level latency goals. Gradient Labs reports a noticeable quality improvement after introducing Nova-3. SigmaMind AI says Nova-3 and Flux cut end-to-end agent latency by roughly 300 ms at scale. All three are vendor-published case studies, so weigh them as marketing evidence, but they map the target adoption pattern clearly enough.
On price, Nova-3 sits in a premium-but-cheap position. The current pricing page shows Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming on pay-as-you-go, with lower rates on Growth; Nova-3 Multilingual is listed above that. Add-ons such as redaction and Keyterm Prompting are priced separately, and concurrency ceilings are tied to plan level. Nova-3 is not the absolute cheapest STT in every scenario, but it is competitively priced for a managed model that also offers self-hosting, VPC, and regulated-environment deployment.
Licensing is straightforward: Nova-3 is proprietary commercial software, not an open model. Deepgram exposes it through a paid API, private and self-hosted deployments, AWS Marketplace and SageMaker-style distribution, and partner channels such as Together AI. The public GitHub organization is SDKs and API specifications, not downloadable weights, and I found no public open-source model license or weight release for Nova-3 in the reviewed materials.
Where it falls short
Opacity is the headline limitation. Deepgram publishes enough to buy and deploy Nova-3, but not enough to scrutinize it as a research artifact: no first-party paper, no parameter count, no full model card, no training-corpus accounting in the reviewed materials. Engineering buyers may not care. Researchers, regulated public-sector procurement, and anyone who needs reproducibility should.
Benchmark asymmetry is the second problem. Most of Nova-3's headline quality claims are vendor-generated, which does not make them false, but does mean they are optimized around Deepgram's chosen datasets, normalization rules, and product framing. The independent evidence shows the usual ASR pattern of performance varying sharply by language, accent, domain, and speech pathology. On Voice of India, Nova-3 shows elevated error rates on some languages such as Tamil and Odia. On dysarthric speech, every tested system, Nova-3 included, degrades badly as severity rises.
The third issue is feature bifurcation inside Deepgram's own lineup. Nova-3 remains the accuracy-first general ASR, but Flux is increasingly the conversation-native model with integrated turn detection and lower end-of-turn delay. Sensible product strategy, but it means Nova-3 is no longer the unambiguous answer for every voice AI use case. If the workload is a turn-based agent, Deepgram itself now steers developers toward Flux.
Smaller but practical gotchas: Keyterm Prompting is capped at 100 terms, with 20 to 50 recommended for reliability, because stuffing the list raises the risk of force-fitting terms into transcripts. Language-count messaging is inconsistent between the pricing page and marketing. And enabling diarization lowers concurrency ceilings versus plain STT, which bites real-time high-volume systems.
Where the line goes next
Reading the public release behavior rather than speculating, five threads look likely to continue. Language expansion is the obvious one, since nearly every Nova-3 update across late 2025 and 2026 has done exactly that. Multilingual and code-switching quality should keep improving too; the retrained multilingual release specifically credits curriculum and data-curation changes in that area.
Vertical specialization is already in motion, with healthcare as the proof point. Nova-3 Medical and its later streaming and batch upgrades are how a company tests a base-model-plus-domain-model strategy, and similar moves for legal, finance, or public sector would fit the pattern. Private deployment and sovereignty options are deepening as well: the SageMaker path, the NVIDIA and Fortanix joint story, and the Prem AI case study all point the same direction, toward customers who want Nova-class ASR inside their own cloud boundary or on-prem footprint. And the product architecture is settling into an explicit split where Nova-3 stays the premium general ASR line and Flux becomes the preferred agent-interaction line, which the current docs and migration guides already imply.
Open questions
A few things remain genuinely unanswerable from public sources. Nova-3's exact parameter count, core architecture, and base training-data scale are undisclosed. There is still no single neutral benchmark that scores Nova-3 against Google, Azure, Speechmatics, and AssemblyAI on multilingual, noisy, code-switching audio with latency, diarization, and formatting under the same rules. And nothing public indicates whether Deepgram will ever publish a true Nova-3 technical paper, or keep documenting the line as a commercial product and nothing more.
Sources
- February 12, 2025 changelog: https://developers.deepgram.com/changelog/2025/2/12
- Models & Languages Overview: https://developers.deepgram.com/docs/models-languages-overview
- Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text: https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
- Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API: https://deepgram.com/learn/nova-2-speech-to-text-api
- Introducing Nova: World's Most Powerful Speech-to-Text API: https://deepgram.com/learn/nova-speech-to-text-whisper-api
- Deepgram API Overview: https://developers.deepgram.com/reference/deepgram-api-overview
- Getting Started, live streaming audio: https://developers.deepgram.com/docs/live-streaming-audio
- Speaker Diarization: https://developers.deepgram.com/docs/diarization
- Smart Formatting: https://developers.deepgram.com/docs/smart-format
- Supported Entity Types: https://developers.deepgram.com/docs/supported-entity-types
- Language Detection: https://developers.deepgram.com/docs/language-detection
- Profanity Filtering: https://developers.deepgram.com/docs/profanity-filter
- Deepgram Pricing: https://deepgram.com/pricing
- Measuring STT Latency: https://developers.deepgram.com/docs/measuring-streaming-latency
- Deploy Deepgram on Amazon SageMaker: https://developers.deepgram.com/docs/deploy-amazon-sagemaker
- Introducing "State of Voice AI 2025": https://deepgram.com/learn/state-of-voice-ai-2025
- Migrating From Google Speech-to-Text (STT) to Deepgram: https://developers.deepgram.com/docs/migrating-from-google-speech-to-text-stt-to-deepgram
- Meet our leadership team: https://deepgram.com/company/leadership
- Introducing Nova-3 Medical: https://deepgram.com/learn/introducing-nova-3-medical-speech-to-text-api
- US10540959B1, Augmented generalized deep learning with special vocabulary: https://patents.google.com/patent/US10540959B1/en
- Deepgram Raises $130M Series C at $1.3B Valuation: https://deepgram.com/learn/press-release-deepgram-raises-series-c
- Model Options: https://developers.deepgram.com/docs/model
- API Rate Limits: https://developers.deepgram.com/reference/api-rate-limits
- Nova-3 Medical Streaming: https://deepgram.com/learn/nova-3-medical-streaming-update
- Nova-3 Multilingual: Major WER Improvements Across Languages: https://deepgram.com/learn/nova-3-multilingual-major-wer-improvements-across-languages
- Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3: https://deepgram.com/learn/speech-to-text-for-hebrew-persian-urdu-on-nova-3
- August 15, 2025 changelog: https://developers.deepgram.com/changelog/2025/8/15
- When to Use Nova-2 vs Nova-3 (for Devs): https://deepgram.com/learn/model-comparison-when-to-use-nova-2-vs-nova-3-for-devs
- Voice Agents That Prioritize Data Security and Run Where Your Data Lives: https://deepgram.com/learn/voice-agents-deepgram-nvidia-nemotron
- "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most: https://arxiv.org/html/2602.12249v2
- Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India: https://arxiv.org/html/2604.19151v2
- Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models: https://arxiv.org/abs/2512.17474
- End-to-end automatic speech recognition with transformer (US12380880B2): https://patents.google.com/patent/US12380880B2/en
- Soft label generation for knowledge distillation (US11410029B2): https://patents.google.com/patent/US11410029B2/en
- Deepgram GitHub organization: https://github.com/deepgram
- Whisper model card: https://github.com/openai/whisper/blob/main/model-card.md
- turbo model release, openai/whisper discussion #2363: https://github.com/openai/whisper/discussions/2363
- openai/whisper repository: https://github.com/openai/whisper
- Introducing next-generation audio models in the API: https://openai.com/index/introducing-our-next-generation-audio-models/
- OpenAI API Pricing: https://openai.com/api/pricing/
- Google Cloud launches new AI models, opens Generative AI Studio: https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-models-opens-generative-ai-studio
- Compare transcription models, Cloud Speech-to-Text: https://docs.cloud.google.com/speech-to-text/docs/transcription-model
- Chirp 2: Enhanced multilingual accuracy: https://docs.cloud.google.com/speech-to-text/docs/models/chirp-2
- Cloud Speech-to-Text On-Prem Pricing: https://cloud.google.com/speech-to-text/priv/pricing
- Google Cloud Speech-to-Text V2 API: https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
- Language and Voice Support for Azure Speech: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- Speech to Text Overview, Azure Speech Service: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- Real-time diarization quickstart, Azure Speech service: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization
- Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
- NVIDIA ASR NIM Support Matrix: https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/asr.html
- About NVIDIA ASR NIM Microservice: https://docs.nvidia.com/nim/speech/latest/asr/index.html
- Speechmatics, AI Speech Technology: https://www.speechmatics.com/
- Speechmatics realtime diarization: https://docs.speechmatics.com/speech-to-text/realtime/realtime-diarization
- Speechmatics on-device speech-to-text: https://www.speechmatics.com/speech-to-text/on-device
- Speechmatics pricing: https://www.speechmatics.com/pricing
- AssemblyAI: 99 Languages, Advanced Features, One Price: https://www.assemblyai.com/blog/99-languages
- Introducing Universal-3 Pro: https://www.assemblyai.com/blog/introducing-universal-3-pro
- AssemblyAI pricing: https://www.assemblyai.com/pricing
- facebookresearch/seamless_communication: https://github.com/facebookresearch/seamless_communication
- SeamlessM4T README: https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/README.md
- Seamless Communication, AI at Meta: https://ai.meta.com/research/seamless-communication/
- Prem AI Brings Sovereign Voice AI to Regulated Enterprises: https://deepgram.com/customers/prem-ai
- Large Vocabulary Speech Recognition: A Practical Guide: https://deepgram.com/learn/large-vocabulary-speech-recognition