OpenTranscription/ Blog
2026-07-03 · MODEL PROFILE

Deepgram Nova-3: model profile

Reference profile of Deepgram Nova-3, a proprietary speech-to-text model family for batch and streaming transcription, released February 12, 2025.

model-profilespeech-to-textdeepgramasrtranscription
Deepgram
Model profile Deepgram

Nova-3 is Deepgram's proprietary general-purpose speech-to-text model family for batch and streaming transcription, released on February 12, 2025.

Specifications

DeveloperDeepgram
ReleasedFebruary 12, 2025 (Nova-3 English; multilingual and self-hosted support followed)
Model typeProprietary end-to-end speech-to-text (ASR) model family
Training dataNot publicly disclosed for base Nova-3.
Languages45+ per Deepgram's pricing page; some 2026 Deepgram marketing pages state 50+
Modes (batch / streaming)Batch and streaming, via the /v1/listen API and a WebSocket streaming API
LatencyVendor-reported: streaming models optimized for 300 ms or less transcription latency; sub-300 ms under typical conditions; 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC
Throughput / concurrency50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth; 200 pre-recorded and 300 streaming starting limits on Enterprise; lower with diarization enabled
DeploymentManaged API, self-hosted / on-prem / private VPC, Amazon SageMaker, Together AI dedicated inference
PricingNova-3 Monolingual: $0.0048/min pre-recorded, $0.0077/min streaming on pay-as-you-go; Nova-3 Multilingual listed above that; lower rates on Growth
LicenseProprietary commercial software; no public open-source model license or public weight release found in the reviewed materials

Not disclosedParameters

Full technical breakdown9 sections

Overview

Deepgram's documentation describes Nova-3 as its highest-performing general-purpose ASR model and recommends it for meetings, event captioning, multi-speaker audio, multilingual and code-switching audio, noisy and far-field input, and both batch and streaming transcription. Deepgram distinguishes Nova-3 from Flux, its newer model for turn-based voice-agent interaction with model-native turn detection.

At launch, Nova-3 English was available through Deepgram's API for pre-recorded and real-time streaming transcription, with multilingual and self-hosted support to follow. The launch changelog reported a 54.3% reduction in WER for streaming against competitors (6.84% median WER) and a 47.4% reduction for batch (5.26% median WER), on Deepgram's own benchmark suite. The same announcement stated that Nova-3 maintained inference speed comparable to Nova-2 while adding Keyterm Prompting, improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.

Deepgram has not published a Nova-3 architecture paper, parameter count, or full training-corpus disclosure. Public materials consist of documentation, launch blogs, changelog posts, patents covering adjacent Deepgram ASR techniques, and partner pages.

Deepgram's 2025 "State of Voice AI" report, produced with Opus Research, found that 67% of surveyed businesses viewed voice technology as foundational, 84% expected to increase voice-tech budgets, 80% were already using some form of voice agent or IVR, and 21% were "very satisfied" with current systems. The source notes this is Deepgram-sponsored research.

Disclosure status

Aspect What is public Source assessment
Model class Proprietary end-to-end speech-to-text system; general ASR, not turn-detection-centric. High confidence.
Architecture Deepgram's prior Nova-2 writeup says the Nova family uses a Transformer-based architecture with speech-specific optimizations; Deepgram patents cover fused end-to-end ASR with transformers and knowledge-distillation methods. A third-party Together AI model page describes Nova-3 as a "latent space architecture." Nova-3 is a proprietary, transformer-based, end-to-end ASR system by inference, but Deepgram has not published a first-party Nova-3 architecture paper.
Model size Not publicly disclosed by Deepgram in the reviewed Nova-3 docs. Undisclosed.
Training data No public corpus-size disclosure for base Nova-3. Official materials emphasize real-world enterprise audio and challenging acoustic conditions; the retrained multilingual model credits improved curriculum and data curation; the medical variant says its evaluation uses public and proprietary customer audio. Together AI says Nova-3 used synthetic plus real-world conversational datasets, but that is not a Deepgram primary source. Public evidence supports enterprise conversational audio plus active curation, not a full dataset accounting.

Historical anchors from predecessor models: Deepgram's original Nova post said Nova was trained across 100+ domains and 47 billion tokens, and the Nova-2 post said Nova-2 used a two-stage curriculum over data curated from nearly 6 million resources, plus a library of human transcriptions. These disclosures are not explicit Nova-3 corpus specifications.

Capabilities and features

Official documentation shows support for:

  • Batch and streaming transcription via Deepgram's /v1/listen API and a WebSocket streaming API.
  • Speaker diarization on Nova batch models, with separate concurrency limits when diarization is enabled.
  • Keyterm Prompting for up to 100 terms; Deepgram later recommended roughly 20 to 50 terms as the practical range.
  • Smart Formatting, including punctuation and paragraphs generally, plus entity formatting for supported languages; self-hosted Nova-3 requires the separate entity-detector model for the best formatting.
  • Redaction for 50+ entity types and groups such as PII, PCI, PHI, and numbers.
  • Language detection for dominant-language identification and multilingual code-switching support via language=multi.
  • Profanity filtering for multilingual models and filler-word preservation on general Nova, Nova-2, and Nova-3 models.

The launch announcement also listed improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.

A domain variant, Nova-3 Medical, was released on March 3, 2025, followed by a Nova-3 Medical Streaming update on June 4, 2025.

Language support

Deepgram's pricing page states that Nova models support 45+ languages; some 2026 Deepgram marketing pages use "50+" phrasing. The source reads this as at least 45 languages, with ongoing additions through 2026.

Documented language rollouts, in order: German, Dutch, Swedish, Danish; later Spanish, French, Portuguese; later Italian, Turkish, Norwegian, Indonesian; then 12 more languages across Europe and South Asia; then Hebrew, Persian, and Urdu; then Mandarin Chinese; then Gujarati.

A retrained Nova-3 Multilingual model, announced February 13, 2026, reported WER improvements across languages and credited curriculum and data-curation changes for multilingual and code-switching behavior.

Performance and benchmarks

Vendor-reported: the Nova-3 launch changelog reported a 5.26% median batch WER (a 47.4% reduction against competitors) and a 6.84% median streaming WER (a 54.3% reduction against competitors) on Deepgram's own benchmark suite.

Third-party evaluations:

  • The Voice of India benchmark (2026) places Nova-3 in a weaker tier on several underrepresented Indic languages, with elevated error rates on some languages such as Tamil and Odia.
  • A zero-shot dysarthric speech evaluation (2025) found that all tested systems, including Nova-3, degrade sharply as dysarthria severity rises.
  • A 2026 street-name benchmark ("Sorry, I Didn't Catch That") compared Nova-3 with Whisper, Chirp, and other systems on a difficult lexical task.

The source notes that direct cross-model comparisons should account for differing datasets, audio conditions, and output-normalization rules, and that Nova-3's headline WER numbers come from Deepgram-authored evaluations while open models such as Whisper and SeamlessM4T are documented via papers and model cards.

Comparison snapshot

System Public size Languages Real-time use Diarization and production features Deployment Public pricing signal Positioning relative to Nova-3 (per source)
Deepgram Nova-3 Undisclosed 45+ officially on pricing page; language count still expanding Yes; sub-300 ms target. Keyterm Prompting, redaction, smart formatting, multilingual code-switching, diarization. API, self-hosted, VPC/on-prem, SageMaker, Together AI. Pay-as-you-go pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming; multilingual higher. Strongest public case is enterprise/noisy production ASR plus deployment flexibility.
OpenAI Whisper large-v3 1.55B Multilingual; tokenizer covers 99 languages. Not natively productized for streaming in the open-source release; wrappers exist. turbo speeds inference with 4 decoder layers. No native diarization in the core model; usually paired with external tools. Open source / self-hosted; API access history via OpenAI and others. Open-source cost is infra-dependent. Most transparent baseline; strongest for openness and ecosystem, weaker than Nova-3 as a turnkey enterprise stack.
OpenAI GPT-Realtime-Whisper / next-gen audio models Not publicly parameterized Multilingual STT product line. Yes, explicitly streaming. Productized realtime STT; official claim is better WER than Whisper v2/v3. Managed API. $0.017/min for GPT-Realtime-Whisper. More "LLM-audio" oriented; higher priced than Nova-3's listed STT rates.
Google Chirp 2 / Chirp 3 Chirp foundation model publicly described as 2B; millions of hours of audio. 100+ languages for Chirp foundation; current STT docs show multilingual auto language detection and diarization. Yes. Timestamps, profanity filtering, auto language detection, diarization. Cloud and on-prem offerings. Google's v2 launch blog set pricing at $0.016/min, with volume tiers as low as $0.004/min. Strong hyperscaler multilingual rival on paper; public benchmark transparency uneven.
Microsoft Azure Speech Undisclosed 140+ locales / supported inputs. Real-time, fast, and batch transcription. Real-time diarization, language identification, custom speech. Managed cloud plus containers/enterprise deployment options in Azure ecosystem. Region-specific pricing page; billed per second. Enterprise breadth and customization are strong; public flagship WER transparency is weaker than Nova-3's marketing.
NVIDIA Parakeet 1.1B / Riva / NIM 1.1B for Parakeet 1.1b RNNT Multilingual. 25+ languages for Parakeet RNNT multilingual; more via other NIM variants. Yes, streaming + offline. Auto punctuation/capitalization; streaming diarization via Sortformer for Parakeet and Conformer families. Self-hosted / GPU-native path. Hardware/licensing dependent. Self-hosted rival for GPU-first teams; less turnkey SaaS than Nova-3.
Speechmatics Ursa 2 / STT API Undisclosed 55+ / 56+ languages. Yes; Speechmatics advertises sub-second, speaker-aware STT. Realtime diarization, multilingual support, batch + streaming. API and on-device offerings. Pricing page shows Pro from $0.24/hr and 50 concurrent real-time sessions. Credible on multilingual realtime diarization; fewer public benchmark details.
AssemblyAI Universal Undisclosed 99 languages; diarization for 95. Yes. Universal-3 Pro Streaming adds prompting and real-time diarization. Language detection, formatting, filler words, keyterms, timestamps, diarization. Managed API. Universal supports 99 languages at $0.27/hr flat; U3 Pro Streaming is $0.45/hr base. Aggressive price/language story; Nova-3 differentiates on self-hosting and enterprise deployment flexibility.
Meta SeamlessM4T v2 / SeamlessStreaming / MMS SeamlessM4T v2 uses UnitY2; MMS covers 1,107 STT languages. 101 speech-input languages for SeamlessM4T; 96 for streaming ASR; 1,107 for MMS STT. Research-grade streaming exists via SeamlessStreaming, around 2 seconds latency. Suited to multilingual research and speech translation; not a turnkey commercial STT stack with built-in diarization/enterprise extras. Open research code/models. Infra-dependent. Meta leads on open multilingual breadth, especially translation and language coverage, but not on turnkey enterprise-product completeness.

Latency and throughput

Vendor-reported figures:

  • Deepgram states its streaming models are optimized for 300 ms or less transcription latency, and the Nova-3 latency guide characterizes Nova-3 as delivering sub-300 ms streaming latency under typical conditions.
  • In a Deepgram + NVIDIA private-deployment post (May 27, 2026), Deepgram reported 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC.

Rate-limit documentation shows Nova-3 starting at 50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth, rising to 200 pre-recorded and 300 streaming starting limits on Enterprise. If diarization is enabled, concurrency is materially lower.

Deployment and integrations

Nova-3 is available as a managed API, as self-hosted / on-prem / private-VPC software, and through Amazon SageMaker; it also appears on Together AI as a dedicated-inference offering. Deepgram's docs and partner pages show integrations with Twilio, LiveKit, Pipecat, Amazon Connect, Amazon Lex, and other ecosystem components.

SDKs are available in JavaScript, Python, Go, .NET, and Java.

Deepgram's partner ecosystem and product pages show Nova-3 attached to AWS/Amazon Connect, NVIDIA, Fortanix, OneReach.ai, Vonage, and Together AI, among others.

Deepgram published migration guides from AWS Transcribe, Google Speech-to-Text, OpenAI Whisper, and AssemblyAI.

Vendor-published case studies: Prem AI selected self-hosted Nova-3 Base as its primary STT engine for sovereign voice workloads, citing English and EU-language accuracy, better diarization than a self-hosted Whisper plus pyannote stack, and streaming performance that met sub-300 to 500 ms turn-level latency goals. Gradient Labs reported a quality improvement after introducing Nova-3. SigmaMind AI stated that Nova-3 and Flux reduced end-to-end agent latency by roughly 300 ms at scale. The source identifies these as vendor-published case studies.

Pricing

Deepgram's pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming on pay-as-you-go, with lower rates on Growth; Nova-3 Multilingual is listed above that. Deepgram prices add-ons such as redaction and Keyterm Prompting separately, and the pricing page ties concurrency ceilings to plan level.

Development and ownership

Nova-3 was developed by Deepgram. Deepgram does not publish a Nova-3 author list; public attribution is organizational. Scott Stephenson is co-founder and CEO; Adam Sypniewski is CTO and, per Deepgram's bio, leads the research and engineering teams building the company's speech-recognition systems; Andrew Seagraves is VP of Research; Morris Gevirtz is Head of Language. Deepgram's Nova-2 materials credit its in-house model research team and DataOps team for speech-specific transformer optimization, data curation, and multi-stage training.

Public-facing release materials were authored by Jose Nicholas Francisco (Nova-3 Medical), Hasan Jilani (Nova-3 Medical Streaming and later Nova/Flux marketing), and Martine Katz (multilingual expansion and multilingual WER improvements). A Deepgram + NVIDIA deployment post was co-authored by Conner Hughes and Michael Wang.

Deepgram patent US10540959B1 lists Jeff Ward, Adam Sypniewski, and Scott Stephenson as inventors on techniques related to domain adaptation and special vocabulary handling. Other Deepgram-adjacent patents cover fused end-to-end ASR with transformers (US12380880B2) and soft label generation for knowledge distillation (US11410029B2). The source states these patents do not prove that Nova-3 uses each disclosed method verbatim.

Deepgram's Series C was led by AVP; its Series B was led by Madrona with Alkeon and others participating. The source found no evidence of an outside academic lab or hyperscaler named as co-author or co-trainer of the base Nova-3 model.

Release history

Date Milestone
2025-02-12 Nova-3 launch: Nova-3 English via API for pre-recorded and real-time streaming transcription; headline metrics of 5.26% batch WER and 6.84% streaming WER.
2025-03-03 Nova-3 Medical released.
2025-06-04 Nova-3 Medical Streaming update.
2025-06-22 "When to Use Nova-2 vs Nova-3" comparative positioning piece.
2025-08 to 2026-04 Language support rollouts via changelog.
2026-02-12 Speech-to-text for Hebrew, Persian, and Urdu on Nova-3.
2026-02-13 Nova-3 Multilingual: WER improvements across languages, with training-curriculum and data-curation changes.
2026-05-27 Deepgram + NVIDIA private-deployment post, including 198 ms P50 first-token latency for self-hosted Nova-3.

Language additions across this period, in order: German, Dutch, Swedish, Danish; Spanish, French, Portuguese; Italian, Turkish, Norwegian, Indonesian; 12 more languages across Europe and South Asia; Hebrew, Persian, and Urdu; Mandarin Chinese; Gujarati.

Sources

The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono