Deepgram Nova-3: model profile

Nova-3 is Deepgram's proprietary general-purpose speech-to-text model family for batch and streaming transcription, released on February 12, 2025.

Specifications

Developer	Deepgram
Released	February 12, 2025 (Nova-3 English; multilingual and self-hosted support followed)
Model type	Proprietary end-to-end speech-to-text (ASR) model family
Training data	Not publicly disclosed for base Nova-3.
Languages	45+ per Deepgram's pricing page; some 2026 Deepgram marketing pages state 50+
Modes (batch / streaming)	Batch and streaming, via the /v1/listen API and a WebSocket streaming API
Latency	Vendor-reported: streaming models optimized for 300 ms or less transcription latency; sub-300 ms under typical conditions; 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC
Throughput / concurrency	50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth; 200 pre-recorded and 300 streaming starting limits on Enterprise; lower with diarization enabled
Deployment	Managed API, self-hosted / on-prem / private VPC, Amazon SageMaker, Together AI dedicated inference
Pricing	Nova-3 Monolingual: $0.0048/min pre-recorded, $0.0077/min streaming on pay-as-you-go; Nova-3 Multilingual listed above that; lower rates on Growth
License	Proprietary commercial software; no public open-source model license or public weight release found in the reviewed materials

Not disclosedParameters

Known limitations

No first-party Nova-3 architecture paper, parameter-count disclosure, full model card, or training-corpus accounting exists in the reviewed materials.
Most headline quality claims are vendor-generated and measured on Deepgram's chosen datasets, normalization rules, and product framing.
Third-party evaluation: the Voice of India benchmark shows elevated error rates on some Indic languages such as Tamil and Odia.
Third-party evaluation: on dysarthric speech, all tested systems, including Nova-3, degrade sharply as severity rises.
Keyterm Prompting is capped at 100 terms, with Deepgram recommending 20 to 50 for reliability; too many terms raises the risk of force-fitting.
Language-count messaging is inconsistent across Deepgram materials (45+ on the pricing page, 50+ on some 2026 marketing pages).
Enabling diarization lowers concurrency ceilings versus plain STT.
Deepgram positions Flux, not Nova-3, as its preferred model for turn-based voice agents.
Public materials do not include downloadable Nova-3 weights or an open-source license.
No single neutral, multilingual, noisy, code-switching benchmark compares Nova-3 with Google, Azure, Speechmatics, and AssemblyAI under the same scoring rules, per the source.

Full technical breakdown9 sections

Overview

Deepgram's documentation describes Nova-3 as its highest-performing general-purpose ASR model and recommends it for meetings, event captioning, multi-speaker audio, multilingual and code-switching audio, noisy and far-field input, and both batch and streaming transcription. Deepgram distinguishes Nova-3 from Flux, its newer model for turn-based voice-agent interaction with model-native turn detection.

At launch, Nova-3 English was available through Deepgram's API for pre-recorded and real-time streaming transcription, with multilingual and self-hosted support to follow. The launch changelog reported a 54.3% reduction in WER for streaming against competitors (6.84% median WER) and a 47.4% reduction for batch (5.26% median WER), on Deepgram's own benchmark suite. The same announcement stated that Nova-3 maintained inference speed comparable to Nova-2 while adding Keyterm Prompting, improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.

Deepgram has not published a Nova-3 architecture paper, parameter count, or full training-corpus disclosure. Public materials consist of documentation, launch blogs, changelog posts, patents covering adjacent Deepgram ASR techniques, and partner pages.

Deepgram's 2025 "State of Voice AI" report, produced with Opus Research, found that 67% of surveyed businesses viewed voice technology as foundational, 84% expected to increase voice-tech budgets, 80% were already using some form of voice agent or IVR, and 21% were "very satisfied" with current systems. The source notes this is Deepgram-sponsored research.

Disclosure status

Aspect	What is public	Source assessment
Model class	Proprietary end-to-end speech-to-text system; general ASR, not turn-detection-centric.	High confidence.
Architecture	Deepgram's prior Nova-2 writeup says the Nova family uses a Transformer-based architecture with speech-specific optimizations; Deepgram patents cover fused end-to-end ASR with transformers and knowledge-distillation methods. A third-party Together AI model page describes Nova-3 as a "latent space architecture."	Nova-3 is a proprietary, transformer-based, end-to-end ASR system by inference, but Deepgram has not published a first-party Nova-3 architecture paper.
Model size	Not publicly disclosed by Deepgram in the reviewed Nova-3 docs.	Undisclosed.
Training data	No public corpus-size disclosure for base Nova-3. Official materials emphasize real-world enterprise audio and challenging acoustic conditions; the retrained multilingual model credits improved curriculum and data curation; the medical variant says its evaluation uses public and proprietary customer audio. Together AI says Nova-3 used synthetic plus real-world conversational datasets, but that is not a Deepgram primary source.	Public evidence supports enterprise conversational audio plus active curation, not a full dataset accounting.

Historical anchors from predecessor models: Deepgram's original Nova post said Nova was trained across 100+ domains and 47 billion tokens, and the Nova-2 post said Nova-2 used a two-stage curriculum over data curated from nearly 6 million resources, plus a library of human transcriptions. These disclosures are not explicit Nova-3 corpus specifications.

Capabilities and features

Official documentation shows support for:

Batch and streaming transcription via Deepgram's /v1/listen API and a WebSocket streaming API.
Speaker diarization on Nova batch models, with separate concurrency limits when diarization is enabled.
Keyterm Prompting for up to 100 terms; Deepgram later recommended roughly 20 to 50 terms as the practical range.
Smart Formatting, including punctuation and paragraphs generally, plus entity formatting for supported languages; self-hosted Nova-3 requires the separate entity-detector model for the best formatting.
Redaction for 50+ entity types and groups such as PII, PCI, PHI, and numbers.
Language detection for dominant-language identification and multilingual code-switching support via language=multi.
Profanity filtering for multilingual models and filler-word preservation on general Nova, Nova-2, and Nova-3 models.

The launch announcement also listed improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.

A domain variant, Nova-3 Medical, was released on March 3, 2025, followed by a Nova-3 Medical Streaming update on June 4, 2025.

Language support

Deepgram's pricing page states that Nova models support 45+ languages; some 2026 Deepgram marketing pages use "50+" phrasing. The source reads this as at least 45 languages, with ongoing additions through 2026.

Documented language rollouts, in order: German, Dutch, Swedish, Danish; later Spanish, French, Portuguese; later Italian, Turkish, Norwegian, Indonesian; then 12 more languages across Europe and South Asia; then Hebrew, Persian, and Urdu; then Mandarin Chinese; then Gujarati.

A retrained Nova-3 Multilingual model, announced February 13, 2026, reported WER improvements across languages and credited curriculum and data-curation changes for multilingual and code-switching behavior.

Performance and benchmarks

Vendor-reported: the Nova-3 launch changelog reported a 5.26% median batch WER (a 47.4% reduction against competitors) and a 6.84% median streaming WER (a 54.3% reduction against competitors) on Deepgram's own benchmark suite.

Third-party evaluations:

The Voice of India benchmark (2026) places Nova-3 in a weaker tier on several underrepresented Indic languages, with elevated error rates on some languages such as Tamil and Odia.
A zero-shot dysarthric speech evaluation (2025) found that all tested systems, including Nova-3, degrade sharply as dysarthria severity rises.
A 2026 street-name benchmark ("Sorry, I Didn't Catch That") compared Nova-3 with Whisper, Chirp, and other systems on a difficult lexical task.

The source notes that direct cross-model comparisons should account for differing datasets, audio conditions, and output-normalization rules, and that Nova-3's headline WER numbers come from Deepgram-authored evaluations while open models such as Whisper and SeamlessM4T are documented via papers and model cards.

Comparison snapshot

System	Public size	Languages	Real-time use	Diarization and production features	Deployment	Public pricing signal	Positioning relative to Nova-3 (per source)
Deepgram Nova-3	Undisclosed	45+ officially on pricing page; language count still expanding	Yes; sub-300 ms target.	Keyterm Prompting, redaction, smart formatting, multilingual code-switching, diarization.	API, self-hosted, VPC/on-prem, SageMaker, Together AI.	Pay-as-you-go pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming; multilingual higher.	Strongest public case is enterprise/noisy production ASR plus deployment flexibility.
OpenAI Whisper large-v3	1.55B	Multilingual; tokenizer covers 99 languages.	Not natively productized for streaming in the open-source release; wrappers exist. turbo speeds inference with 4 decoder layers.	No native diarization in the core model; usually paired with external tools.	Open source / self-hosted; API access history via OpenAI and others.	Open-source cost is infra-dependent.	Most transparent baseline; strongest for openness and ecosystem, weaker than Nova-3 as a turnkey enterprise stack.
OpenAI GPT-Realtime-Whisper / next-gen audio models	Not publicly parameterized	Multilingual STT product line.	Yes, explicitly streaming.	Productized realtime STT; official claim is better WER than Whisper v2/v3.	Managed API.	$0.017/min for GPT-Realtime-Whisper.	More "LLM-audio" oriented; higher priced than Nova-3's listed STT rates.
Google Chirp 2 / Chirp 3	Chirp foundation model publicly described as 2B; millions of hours of audio.	100+ languages for Chirp foundation; current STT docs show multilingual auto language detection and diarization.	Yes.	Timestamps, profanity filtering, auto language detection, diarization.	Cloud and on-prem offerings.	Google's v2 launch blog set pricing at $0.016/min, with volume tiers as low as $0.004/min.	Strong hyperscaler multilingual rival on paper; public benchmark transparency uneven.
Microsoft Azure Speech	Undisclosed	140+ locales / supported inputs.	Real-time, fast, and batch transcription.	Real-time diarization, language identification, custom speech.	Managed cloud plus containers/enterprise deployment options in Azure ecosystem.	Region-specific pricing page; billed per second.	Enterprise breadth and customization are strong; public flagship WER transparency is weaker than Nova-3's marketing.
NVIDIA Parakeet 1.1B / Riva / NIM	1.1B for Parakeet 1.1b RNNT Multilingual.	25+ languages for Parakeet RNNT multilingual; more via other NIM variants.	Yes, streaming + offline.	Auto punctuation/capitalization; streaming diarization via Sortformer for Parakeet and Conformer families.	Self-hosted / GPU-native path.	Hardware/licensing dependent.	Self-hosted rival for GPU-first teams; less turnkey SaaS than Nova-3.
Speechmatics Ursa 2 / STT API	Undisclosed	55+ / 56+ languages.	Yes; Speechmatics advertises sub-second, speaker-aware STT.	Realtime diarization, multilingual support, batch + streaming.	API and on-device offerings.	Pricing page shows Pro from $0.24/hr and 50 concurrent real-time sessions.	Credible on multilingual realtime diarization; fewer public benchmark details.
AssemblyAI Universal	Undisclosed	99 languages; diarization for 95.	Yes. Universal-3 Pro Streaming adds prompting and real-time diarization.	Language detection, formatting, filler words, keyterms, timestamps, diarization.	Managed API.	Universal supports 99 languages at $0.27/hr flat; U3 Pro Streaming is $0.45/hr base.	Aggressive price/language story; Nova-3 differentiates on self-hosting and enterprise deployment flexibility.
Meta SeamlessM4T v2 / SeamlessStreaming / MMS	SeamlessM4T v2 uses UnitY2; MMS covers 1,107 STT languages.	101 speech-input languages for SeamlessM4T; 96 for streaming ASR; 1,107 for MMS STT.	Research-grade streaming exists via SeamlessStreaming, around 2 seconds latency.	Suited to multilingual research and speech translation; not a turnkey commercial STT stack with built-in diarization/enterprise extras.	Open research code/models.	Infra-dependent.	Meta leads on open multilingual breadth, especially translation and language coverage, but not on turnkey enterprise-product completeness.

Latency and throughput

Vendor-reported figures:

Deepgram states its streaming models are optimized for 300 ms or less transcription latency, and the Nova-3 latency guide characterizes Nova-3 as delivering sub-300 ms streaming latency under typical conditions.
In a Deepgram + NVIDIA private-deployment post (May 27, 2026), Deepgram reported 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC.

Rate-limit documentation shows Nova-3 starting at 50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth, rising to 200 pre-recorded and 300 streaming starting limits on Enterprise. If diarization is enabled, concurrency is materially lower.

Deployment and integrations

Nova-3 is available as a managed API, as self-hosted / on-prem / private-VPC software, and through Amazon SageMaker; it also appears on Together AI as a dedicated-inference offering. Deepgram's docs and partner pages show integrations with Twilio, LiveKit, Pipecat, Amazon Connect, Amazon Lex, and other ecosystem components.

SDKs are available in JavaScript, Python, Go, .NET, and Java.

Deepgram's partner ecosystem and product pages show Nova-3 attached to AWS/Amazon Connect, NVIDIA, Fortanix, OneReach.ai, Vonage, and Together AI, among others.

Deepgram published migration guides from AWS Transcribe, Google Speech-to-Text, OpenAI Whisper, and AssemblyAI.

Vendor-published case studies: Prem AI selected self-hosted Nova-3 Base as its primary STT engine for sovereign voice workloads, citing English and EU-language accuracy, better diarization than a self-hosted Whisper plus pyannote stack, and streaming performance that met sub-300 to 500 ms turn-level latency goals. Gradient Labs reported a quality improvement after introducing Nova-3. SigmaMind AI stated that Nova-3 and Flux reduced end-to-end agent latency by roughly 300 ms at scale. The source identifies these as vendor-published case studies.

Pricing

Deepgram's pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming on pay-as-you-go, with lower rates on Growth; Nova-3 Multilingual is listed above that. Deepgram prices add-ons such as redaction and Keyterm Prompting separately, and the pricing page ties concurrency ceilings to plan level.

Development and ownership

Nova-3 was developed by Deepgram. Deepgram does not publish a Nova-3 author list; public attribution is organizational. Scott Stephenson is co-founder and CEO; Adam Sypniewski is CTO and, per Deepgram's bio, leads the research and engineering teams building the company's speech-recognition systems; Andrew Seagraves is VP of Research; Morris Gevirtz is Head of Language. Deepgram's Nova-2 materials credit its in-house model research team and DataOps team for speech-specific transformer optimization, data curation, and multi-stage training.

Public-facing release materials were authored by Jose Nicholas Francisco (Nova-3 Medical), Hasan Jilani (Nova-3 Medical Streaming and later Nova/Flux marketing), and Martine Katz (multilingual expansion and multilingual WER improvements). A Deepgram + NVIDIA deployment post was co-authored by Conner Hughes and Michael Wang.

Deepgram patent US10540959B1 lists Jeff Ward, Adam Sypniewski, and Scott Stephenson as inventors on techniques related to domain adaptation and special vocabulary handling. Other Deepgram-adjacent patents cover fused end-to-end ASR with transformers (US12380880B2) and soft label generation for knowledge distillation (US11410029B2). The source states these patents do not prove that Nova-3 uses each disclosed method verbatim.

Deepgram's Series C was led by AVP; its Series B was led by Madrona with Alkeon and others participating. The source found no evidence of an outside academic lab or hyperscaler named as co-author or co-trainer of the base Nova-3 model.

Release history

Date	Milestone
2025-02-12	Nova-3 launch: Nova-3 English via API for pre-recorded and real-time streaming transcription; headline metrics of 5.26% batch WER and 6.84% streaming WER.
2025-03-03	Nova-3 Medical released.
2025-06-04	Nova-3 Medical Streaming update.
2025-06-22	"When to Use Nova-2 vs Nova-3" comparative positioning piece.
2025-08 to 2026-04	Language support rollouts via changelog.
2026-02-12	Speech-to-text for Hebrew, Persian, and Urdu on Nova-3.
2026-02-13	Nova-3 Multilingual: WER improvements across languages, with training-curriculum and data-curation changes.
2026-05-27	Deepgram + NVIDIA private-deployment post, including 198 ms P50 first-token latency for self-hosted Nova-3.

Language additions across this period, in order: German, Dutch, Swedish, Danish; Spanish, French, Portuguese; Italian, Turkish, Norwegian, Indonesian; 12 more languages across Europe and South Asia; Hebrew, Persian, and Urdu; Mandarin Chinese; Gujarati.

Sources

February 12, 2025 changelog. https://developers.deepgram.com/changelog/2025/2/12
Models & Languages Overview. https://developers.deepgram.com/docs/models-languages-overview
Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text. https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API. https://deepgram.com/learn/nova-2-speech-to-text-api
Introducing Nova: World's Most Powerful Speech-to-Text API. https://deepgram.com/learn/nova-speech-to-text-whisper-api
Deepgram API Overview. https://developers.deepgram.com/reference/deepgram-api-overview
Getting Started, Deepgram Docs. https://developers.deepgram.com/docs/live-streaming-audio
Speaker Diarization, Deepgram Docs. https://developers.deepgram.com/docs/diarization
Smart Formatting, Deepgram Docs. https://developers.deepgram.com/docs/smart-format
Supported Entity Types. https://developers.deepgram.com/docs/supported-entity-types
Language Detection. https://developers.deepgram.com/docs/language-detection
Profanity Filtering. https://developers.deepgram.com/docs/profanity-filter
Deepgram Pricing. https://deepgram.com/pricing
Measuring STT Latency, Deepgram Docs. https://developers.deepgram.com/docs/measuring-streaming-latency
Deploy Deepgram on Amazon SageMaker. https://developers.deepgram.com/docs/deploy-amazon-sagemaker
Introducing "State of Voice AI 2025". https://deepgram.com/learn/state-of-voice-ai-2025
Migrating From Google Speech-to-Text (STT) to Deepgram. https://developers.deepgram.com/docs/migrating-from-google-speech-to-text-stt-to-deepgram
Meet our leadership team. https://deepgram.com/company/leadership
Introducing Nova-3 Medical. https://deepgram.com/learn/introducing-nova-3-medical-speech-to-text-api
US10540959B1, Augmented generalized deep learning with special vocabulary. https://patents.google.com/patent/US10540959B1/en
Deepgram Raises $130M Series C at $1.3B Valuation. https://deepgram.com/learn/press-release-deepgram-raises-series-c
Model Options, Deepgram Docs. https://developers.deepgram.com/docs/model
API Rate Limits, Deepgram Docs. https://developers.deepgram.com/reference/api-rate-limits
Nova-3 Medical Streaming. https://deepgram.com/learn/nova-3-medical-streaming-update
Nova-3 Multilingual: Major WER Improvements Across Languages. https://deepgram.com/learn/nova-3-multilingual-major-wer-improvements-across-languages
Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3. https://deepgram.com/learn/speech-to-text-for-hebrew-persian-urdu-on-nova-3
August 15, 2025 changelog. https://developers.deepgram.com/changelog/2025/8/15
When to Use Nova-2 vs Nova-3 (for Devs). https://deepgram.com/learn/model-comparison-when-to-use-nova-2-vs-nova-3-for-devs
Voice Agents That Prioritize Data Security and Run Where Your Data Lives. https://deepgram.com/learn/voice-agents-deepgram-nvidia-nemotron
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most. https://arxiv.org/html/2602.12249v2
Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India. https://arxiv.org/html/2604.19151v2
Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models. https://arxiv.org/abs/2512.17474
US12380880B2, End-to-end automatic speech recognition with transformer. https://patents.google.com/patent/US12380880B2/en
US11410029B2, Soft label generation for knowledge distillation. https://patents.google.com/patent/US11410029B2/en
Deepgram GitHub organization. https://github.com/deepgram
Whisper model card. https://github.com/openai/whisper/blob/main/model-card.md
turbo model release, openai/whisper Discussion #2363. https://github.com/openai/whisper/discussions/2363
openai/whisper. https://github.com/openai/whisper
Introducing next-generation audio models in the API. https://openai.com/index/introducing-our-next-generation-audio-models/
OpenAI API Pricing. https://openai.com/api/pricing/
Google Cloud launches new AI models, opens Generative AI Studio. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-models-opens-generative-ai-studio
Compare transcription models, Cloud Speech-to-Text. https://docs.cloud.google.com/speech-to-text/docs/transcription-model
Chirp 2: Enhanced multilingual accuracy. https://docs.cloud.google.com/speech-to-text/docs/models/chirp-2
Cloud Speech-to-Text On-Prem Pricing. https://cloud.google.com/speech-to-text/priv/pricing
Google Cloud Speech-to-Text V2 API. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
Language and Voice Support for Azure Speech. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
Speech to Text Overview, Azure Speech Service. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
Real-time diarization quickstart, Azure Speech service. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization
Azure Speech pricing. https://azure.microsoft.com/en-us/pricing/details/speech/
NVIDIA ASR NIM Support Matrix. https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/asr.html
About NVIDIA ASR NIM Microservice. https://docs.nvidia.com/nim/speech/latest/asr/index.html
Speechmatics. https://www.speechmatics.com/
Realtime diarization, Speechmatics Docs. https://docs.speechmatics.com/speech-to-text/realtime/realtime-diarization
On-Device Speech-to-Text for Laptop. https://www.speechmatics.com/speech-to-text/on-device
Speechmatics pricing. https://www.speechmatics.com/pricing
99 Languages, Advanced Features, One Price. https://www.assemblyai.com/blog/99-languages
Introducing Universal-3 Pro. https://www.assemblyai.com/blog/introducing-universal-3-pro
AssemblyAI Pricing. https://www.assemblyai.com/pricing
facebookresearch/seamless_communication. https://github.com/facebookresearch/seamless_communication
seamless_communication/docs/m4t/README.md. https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/README.md
Seamless Communication, AI at Meta. https://ai.meta.com/research/seamless-communication/
Prem AI Brings Sovereign Voice AI to Regulated Enterprises. https://deepgram.com/customers/prem-ai
Large Vocabulary Speech Recognition: A Practical Guide. https://deepgram.com/learn/large-vocabulary-speech-recognition