Deepgram Nova-3: model profile
Reference profile of Deepgram Nova-3, a proprietary speech-to-text model family for batch and streaming transcription, released February 12, 2025.
Nova-3 is Deepgram's proprietary general-purpose speech-to-text model family for batch and streaming transcription, released on February 12, 2025.
Specifications
| Developer | Deepgram |
| Released | February 12, 2025 (Nova-3 English; multilingual and self-hosted support followed) |
| Model type | Proprietary end-to-end speech-to-text (ASR) model family |
| Training data | Not publicly disclosed for base Nova-3. |
| Languages | 45+ per Deepgram's pricing page; some 2026 Deepgram marketing pages state 50+ |
| Modes (batch / streaming) | Batch and streaming, via the /v1/listen API and a WebSocket streaming API |
| Latency | Vendor-reported: streaming models optimized for 300 ms or less transcription latency; sub-300 ms under typical conditions; 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC |
| Throughput / concurrency | 50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth; 200 pre-recorded and 300 streaming starting limits on Enterprise; lower with diarization enabled |
| Deployment | Managed API, self-hosted / on-prem / private VPC, Amazon SageMaker, Together AI dedicated inference |
| Pricing | Nova-3 Monolingual: $0.0048/min pre-recorded, $0.0077/min streaming on pay-as-you-go; Nova-3 Multilingual listed above that; lower rates on Growth |
| License | Proprietary commercial software; no public open-source model license or public weight release found in the reviewed materials |
Not disclosedParameters
Full technical breakdown9 sections
Overview
Deepgram's documentation describes Nova-3 as its highest-performing general-purpose ASR model and recommends it for meetings, event captioning, multi-speaker audio, multilingual and code-switching audio, noisy and far-field input, and both batch and streaming transcription. Deepgram distinguishes Nova-3 from Flux, its newer model for turn-based voice-agent interaction with model-native turn detection.
At launch, Nova-3 English was available through Deepgram's API for pre-recorded and real-time streaming transcription, with multilingual and self-hosted support to follow. The launch changelog reported a 54.3% reduction in WER for streaming against competitors (6.84% median WER) and a 47.4% reduction for batch (5.26% median WER), on Deepgram's own benchmark suite. The same announcement stated that Nova-3 maintained inference speed comparable to Nova-2 while adding Keyterm Prompting, improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.
Deepgram has not published a Nova-3 architecture paper, parameter count, or full training-corpus disclosure. Public materials consist of documentation, launch blogs, changelog posts, patents covering adjacent Deepgram ASR techniques, and partner pages.
Deepgram's 2025 "State of Voice AI" report, produced with Opus Research, found that 67% of surveyed businesses viewed voice technology as foundational, 84% expected to increase voice-tech budgets, 80% were already using some form of voice agent or IVR, and 21% were "very satisfied" with current systems. The source notes this is Deepgram-sponsored research.
Disclosure status
| Aspect | What is public | Source assessment |
|---|---|---|
| Model class | Proprietary end-to-end speech-to-text system; general ASR, not turn-detection-centric. | High confidence. |
| Architecture | Deepgram's prior Nova-2 writeup says the Nova family uses a Transformer-based architecture with speech-specific optimizations; Deepgram patents cover fused end-to-end ASR with transformers and knowledge-distillation methods. A third-party Together AI model page describes Nova-3 as a "latent space architecture." | Nova-3 is a proprietary, transformer-based, end-to-end ASR system by inference, but Deepgram has not published a first-party Nova-3 architecture paper. |
| Model size | Not publicly disclosed by Deepgram in the reviewed Nova-3 docs. | Undisclosed. |
| Training data | No public corpus-size disclosure for base Nova-3. Official materials emphasize real-world enterprise audio and challenging acoustic conditions; the retrained multilingual model credits improved curriculum and data curation; the medical variant says its evaluation uses public and proprietary customer audio. Together AI says Nova-3 used synthetic plus real-world conversational datasets, but that is not a Deepgram primary source. | Public evidence supports enterprise conversational audio plus active curation, not a full dataset accounting. |
Historical anchors from predecessor models: Deepgram's original Nova post said Nova was trained across 100+ domains and 47 billion tokens, and the Nova-2 post said Nova-2 used a two-stage curriculum over data curated from nearly 6 million resources, plus a library of human transcriptions. These disclosures are not explicit Nova-3 corpus specifications.
Capabilities and features
Official documentation shows support for:
- Batch and streaming transcription via Deepgram's /v1/listen API and a WebSocket streaming API.
- Speaker diarization on Nova batch models, with separate concurrency limits when diarization is enabled.
- Keyterm Prompting for up to 100 terms; Deepgram later recommended roughly 20 to 50 terms as the practical range.
- Smart Formatting, including punctuation and paragraphs generally, plus entity formatting for supported languages; self-hosted Nova-3 requires the separate entity-detector model for the best formatting.
- Redaction for 50+ entity types and groups such as PII, PCI, PHI, and numbers.
- Language detection for dominant-language identification and multilingual code-switching support via language=multi.
- Profanity filtering for multilingual models and filler-word preservation on general Nova, Nova-2, and Nova-3 models.
The launch announcement also listed improved handling of background noise and overlapping speech, improved numeric recognition, word-level timestamp precision, real-time redaction, and improved English formatting and paragraphing.
A domain variant, Nova-3 Medical, was released on March 3, 2025, followed by a Nova-3 Medical Streaming update on June 4, 2025.
Language support
Deepgram's pricing page states that Nova models support 45+ languages; some 2026 Deepgram marketing pages use "50+" phrasing. The source reads this as at least 45 languages, with ongoing additions through 2026.
Documented language rollouts, in order: German, Dutch, Swedish, Danish; later Spanish, French, Portuguese; later Italian, Turkish, Norwegian, Indonesian; then 12 more languages across Europe and South Asia; then Hebrew, Persian, and Urdu; then Mandarin Chinese; then Gujarati.
A retrained Nova-3 Multilingual model, announced February 13, 2026, reported WER improvements across languages and credited curriculum and data-curation changes for multilingual and code-switching behavior.
Performance and benchmarks
Vendor-reported: the Nova-3 launch changelog reported a 5.26% median batch WER (a 47.4% reduction against competitors) and a 6.84% median streaming WER (a 54.3% reduction against competitors) on Deepgram's own benchmark suite.
Third-party evaluations:
- The Voice of India benchmark (2026) places Nova-3 in a weaker tier on several underrepresented Indic languages, with elevated error rates on some languages such as Tamil and Odia.
- A zero-shot dysarthric speech evaluation (2025) found that all tested systems, including Nova-3, degrade sharply as dysarthria severity rises.
- A 2026 street-name benchmark ("Sorry, I Didn't Catch That") compared Nova-3 with Whisper, Chirp, and other systems on a difficult lexical task.
The source notes that direct cross-model comparisons should account for differing datasets, audio conditions, and output-normalization rules, and that Nova-3's headline WER numbers come from Deepgram-authored evaluations while open models such as Whisper and SeamlessM4T are documented via papers and model cards.
Comparison snapshot
| System | Public size | Languages | Real-time use | Diarization and production features | Deployment | Public pricing signal | Positioning relative to Nova-3 (per source) |
|---|---|---|---|---|---|---|---|
| Deepgram Nova-3 | Undisclosed | 45+ officially on pricing page; language count still expanding | Yes; sub-300 ms target. | Keyterm Prompting, redaction, smart formatting, multilingual code-switching, diarization. | API, self-hosted, VPC/on-prem, SageMaker, Together AI. | Pay-as-you-go pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming; multilingual higher. | Strongest public case is enterprise/noisy production ASR plus deployment flexibility. |
| OpenAI Whisper large-v3 | 1.55B | Multilingual; tokenizer covers 99 languages. | Not natively productized for streaming in the open-source release; wrappers exist. turbo speeds inference with 4 decoder layers. | No native diarization in the core model; usually paired with external tools. | Open source / self-hosted; API access history via OpenAI and others. | Open-source cost is infra-dependent. | Most transparent baseline; strongest for openness and ecosystem, weaker than Nova-3 as a turnkey enterprise stack. |
| OpenAI GPT-Realtime-Whisper / next-gen audio models | Not publicly parameterized | Multilingual STT product line. | Yes, explicitly streaming. | Productized realtime STT; official claim is better WER than Whisper v2/v3. | Managed API. | $0.017/min for GPT-Realtime-Whisper. | More "LLM-audio" oriented; higher priced than Nova-3's listed STT rates. |
| Google Chirp 2 / Chirp 3 | Chirp foundation model publicly described as 2B; millions of hours of audio. | 100+ languages for Chirp foundation; current STT docs show multilingual auto language detection and diarization. | Yes. | Timestamps, profanity filtering, auto language detection, diarization. | Cloud and on-prem offerings. | Google's v2 launch blog set pricing at $0.016/min, with volume tiers as low as $0.004/min. | Strong hyperscaler multilingual rival on paper; public benchmark transparency uneven. |
| Microsoft Azure Speech | Undisclosed | 140+ locales / supported inputs. | Real-time, fast, and batch transcription. | Real-time diarization, language identification, custom speech. | Managed cloud plus containers/enterprise deployment options in Azure ecosystem. | Region-specific pricing page; billed per second. | Enterprise breadth and customization are strong; public flagship WER transparency is weaker than Nova-3's marketing. |
| NVIDIA Parakeet 1.1B / Riva / NIM | 1.1B for Parakeet 1.1b RNNT Multilingual. | 25+ languages for Parakeet RNNT multilingual; more via other NIM variants. | Yes, streaming + offline. | Auto punctuation/capitalization; streaming diarization via Sortformer for Parakeet and Conformer families. | Self-hosted / GPU-native path. | Hardware/licensing dependent. | Self-hosted rival for GPU-first teams; less turnkey SaaS than Nova-3. |
| Speechmatics Ursa 2 / STT API | Undisclosed | 55+ / 56+ languages. | Yes; Speechmatics advertises sub-second, speaker-aware STT. | Realtime diarization, multilingual support, batch + streaming. | API and on-device offerings. | Pricing page shows Pro from $0.24/hr and 50 concurrent real-time sessions. | Credible on multilingual realtime diarization; fewer public benchmark details. |
| AssemblyAI Universal | Undisclosed | 99 languages; diarization for 95. | Yes. Universal-3 Pro Streaming adds prompting and real-time diarization. | Language detection, formatting, filler words, keyterms, timestamps, diarization. | Managed API. | Universal supports 99 languages at $0.27/hr flat; U3 Pro Streaming is $0.45/hr base. | Aggressive price/language story; Nova-3 differentiates on self-hosting and enterprise deployment flexibility. |
| Meta SeamlessM4T v2 / SeamlessStreaming / MMS | SeamlessM4T v2 uses UnitY2; MMS covers 1,107 STT languages. | 101 speech-input languages for SeamlessM4T; 96 for streaming ASR; 1,107 for MMS STT. | Research-grade streaming exists via SeamlessStreaming, around 2 seconds latency. | Suited to multilingual research and speech translation; not a turnkey commercial STT stack with built-in diarization/enterprise extras. | Open research code/models. | Infra-dependent. | Meta leads on open multilingual breadth, especially translation and language coverage, but not on turnkey enterprise-product completeness. |
Latency and throughput
Vendor-reported figures:
- Deepgram states its streaming models are optimized for 300 ms or less transcription latency, and the Nova-3 latency guide characterizes Nova-3 as delivering sub-300 ms streaming latency under typical conditions.
- In a Deepgram + NVIDIA private-deployment post (May 27, 2026), Deepgram reported 198 ms P50 first-token latency for self-hosted Nova-3 on NVIDIA GPUs inside an AWS VPC.
Rate-limit documentation shows Nova-3 starting at 50 concurrent pre-recorded requests and 150 streaming requests on pay-as-you-go in Europe, 225 streaming in North America on Growth, rising to 200 pre-recorded and 300 streaming starting limits on Enterprise. If diarization is enabled, concurrency is materially lower.
Deployment and integrations
Nova-3 is available as a managed API, as self-hosted / on-prem / private-VPC software, and through Amazon SageMaker; it also appears on Together AI as a dedicated-inference offering. Deepgram's docs and partner pages show integrations with Twilio, LiveKit, Pipecat, Amazon Connect, Amazon Lex, and other ecosystem components.
SDKs are available in JavaScript, Python, Go, .NET, and Java.
Deepgram's partner ecosystem and product pages show Nova-3 attached to AWS/Amazon Connect, NVIDIA, Fortanix, OneReach.ai, Vonage, and Together AI, among others.
Deepgram published migration guides from AWS Transcribe, Google Speech-to-Text, OpenAI Whisper, and AssemblyAI.
Vendor-published case studies: Prem AI selected self-hosted Nova-3 Base as its primary STT engine for sovereign voice workloads, citing English and EU-language accuracy, better diarization than a self-hosted Whisper plus pyannote stack, and streaming performance that met sub-300 to 500 ms turn-level latency goals. Gradient Labs reported a quality improvement after introducing Nova-3. SigmaMind AI stated that Nova-3 and Flux reduced end-to-end agent latency by roughly 300 ms at scale. The source identifies these as vendor-published case studies.
Pricing
Deepgram's pricing page lists Nova-3 Monolingual at $0.0048/min pre-recorded and $0.0077/min streaming on pay-as-you-go, with lower rates on Growth; Nova-3 Multilingual is listed above that. Deepgram prices add-ons such as redaction and Keyterm Prompting separately, and the pricing page ties concurrency ceilings to plan level.
Development and ownership
Nova-3 was developed by Deepgram. Deepgram does not publish a Nova-3 author list; public attribution is organizational. Scott Stephenson is co-founder and CEO; Adam Sypniewski is CTO and, per Deepgram's bio, leads the research and engineering teams building the company's speech-recognition systems; Andrew Seagraves is VP of Research; Morris Gevirtz is Head of Language. Deepgram's Nova-2 materials credit its in-house model research team and DataOps team for speech-specific transformer optimization, data curation, and multi-stage training.
Public-facing release materials were authored by Jose Nicholas Francisco (Nova-3 Medical), Hasan Jilani (Nova-3 Medical Streaming and later Nova/Flux marketing), and Martine Katz (multilingual expansion and multilingual WER improvements). A Deepgram + NVIDIA deployment post was co-authored by Conner Hughes and Michael Wang.
Deepgram patent US10540959B1 lists Jeff Ward, Adam Sypniewski, and Scott Stephenson as inventors on techniques related to domain adaptation and special vocabulary handling. Other Deepgram-adjacent patents cover fused end-to-end ASR with transformers (US12380880B2) and soft label generation for knowledge distillation (US11410029B2). The source states these patents do not prove that Nova-3 uses each disclosed method verbatim.
Deepgram's Series C was led by AVP; its Series B was led by Madrona with Alkeon and others participating. The source found no evidence of an outside academic lab or hyperscaler named as co-author or co-trainer of the base Nova-3 model.
Release history
| Date | Milestone |
|---|---|
| 2025-02-12 | Nova-3 launch: Nova-3 English via API for pre-recorded and real-time streaming transcription; headline metrics of 5.26% batch WER and 6.84% streaming WER. |
| 2025-03-03 | Nova-3 Medical released. |
| 2025-06-04 | Nova-3 Medical Streaming update. |
| 2025-06-22 | "When to Use Nova-2 vs Nova-3" comparative positioning piece. |
| 2025-08 to 2026-04 | Language support rollouts via changelog. |
| 2026-02-12 | Speech-to-text for Hebrew, Persian, and Urdu on Nova-3. |
| 2026-02-13 | Nova-3 Multilingual: WER improvements across languages, with training-curriculum and data-curation changes. |
| 2026-05-27 | Deepgram + NVIDIA private-deployment post, including 198 ms P50 first-token latency for self-hosted Nova-3. |
Language additions across this period, in order: German, Dutch, Swedish, Danish; Spanish, French, Portuguese; Italian, Turkish, Norwegian, Indonesian; 12 more languages across Europe and South Asia; Hebrew, Persian, and Urdu; Mandarin Chinese; Gujarati.
Sources
- February 12, 2025 changelog. https://developers.deepgram.com/changelog/2025/2/12
- Models & Languages Overview. https://developers.deepgram.com/docs/models-languages-overview
- Introducing Nova-3: Setting a New Standard for AI-Driven Speech-to-Text. https://deepgram.com/learn/introducing-nova-3-speech-to-text-api
- Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API. https://deepgram.com/learn/nova-2-speech-to-text-api
- Introducing Nova: World's Most Powerful Speech-to-Text API. https://deepgram.com/learn/nova-speech-to-text-whisper-api
- Deepgram API Overview. https://developers.deepgram.com/reference/deepgram-api-overview
- Getting Started, Deepgram Docs. https://developers.deepgram.com/docs/live-streaming-audio
- Speaker Diarization, Deepgram Docs. https://developers.deepgram.com/docs/diarization
- Smart Formatting, Deepgram Docs. https://developers.deepgram.com/docs/smart-format
- Supported Entity Types. https://developers.deepgram.com/docs/supported-entity-types
- Language Detection. https://developers.deepgram.com/docs/language-detection
- Profanity Filtering. https://developers.deepgram.com/docs/profanity-filter
- Deepgram Pricing. https://deepgram.com/pricing
- Measuring STT Latency, Deepgram Docs. https://developers.deepgram.com/docs/measuring-streaming-latency
- Deploy Deepgram on Amazon SageMaker. https://developers.deepgram.com/docs/deploy-amazon-sagemaker
- Introducing "State of Voice AI 2025". https://deepgram.com/learn/state-of-voice-ai-2025
- Migrating From Google Speech-to-Text (STT) to Deepgram. https://developers.deepgram.com/docs/migrating-from-google-speech-to-text-stt-to-deepgram
- Meet our leadership team. https://deepgram.com/company/leadership
- Introducing Nova-3 Medical. https://deepgram.com/learn/introducing-nova-3-medical-speech-to-text-api
- US10540959B1, Augmented generalized deep learning with special vocabulary. https://patents.google.com/patent/US10540959B1/en
- Deepgram Raises $130M Series C at $1.3B Valuation. https://deepgram.com/learn/press-release-deepgram-raises-series-c
- Model Options, Deepgram Docs. https://developers.deepgram.com/docs/model
- API Rate Limits, Deepgram Docs. https://developers.deepgram.com/reference/api-rate-limits
- Nova-3 Medical Streaming. https://deepgram.com/learn/nova-3-medical-streaming-update
- Nova-3 Multilingual: Major WER Improvements Across Languages. https://deepgram.com/learn/nova-3-multilingual-major-wer-improvements-across-languages
- Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3. https://deepgram.com/learn/speech-to-text-for-hebrew-persian-urdu-on-nova-3
- August 15, 2025 changelog. https://developers.deepgram.com/changelog/2025/8/15
- When to Use Nova-2 vs Nova-3 (for Devs). https://deepgram.com/learn/model-comparison-when-to-use-nova-2-vs-nova-3-for-devs
- Voice Agents That Prioritize Data Security and Run Where Your Data Lives. https://deepgram.com/learn/voice-agents-deepgram-nvidia-nemotron
- "Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most. https://arxiv.org/html/2602.12249v2
- Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India. https://arxiv.org/html/2604.19151v2
- Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models. https://arxiv.org/abs/2512.17474
- US12380880B2, End-to-end automatic speech recognition with transformer. https://patents.google.com/patent/US12380880B2/en
- US11410029B2, Soft label generation for knowledge distillation. https://patents.google.com/patent/US11410029B2/en
- Deepgram GitHub organization. https://github.com/deepgram
- Whisper model card. https://github.com/openai/whisper/blob/main/model-card.md
- turbo model release, openai/whisper Discussion #2363. https://github.com/openai/whisper/discussions/2363
- openai/whisper. https://github.com/openai/whisper
- Introducing next-generation audio models in the API. https://openai.com/index/introducing-our-next-generation-audio-models/
- OpenAI API Pricing. https://openai.com/api/pricing/
- Google Cloud launches new AI models, opens Generative AI Studio. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-models-opens-generative-ai-studio
- Compare transcription models, Cloud Speech-to-Text. https://docs.cloud.google.com/speech-to-text/docs/transcription-model
- Chirp 2: Enhanced multilingual accuracy. https://docs.cloud.google.com/speech-to-text/docs/models/chirp-2
- Cloud Speech-to-Text On-Prem Pricing. https://cloud.google.com/speech-to-text/priv/pricing
- Google Cloud Speech-to-Text V2 API. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
- Language and Voice Support for Azure Speech. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support
- Speech to Text Overview, Azure Speech Service. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- Real-time diarization quickstart, Azure Speech service. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization
- Azure Speech pricing. https://azure.microsoft.com/en-us/pricing/details/speech/
- NVIDIA ASR NIM Support Matrix. https://docs.nvidia.com/nim/speech/latest/reference/support-matrix/asr.html
- About NVIDIA ASR NIM Microservice. https://docs.nvidia.com/nim/speech/latest/asr/index.html
- Speechmatics. https://www.speechmatics.com/
- Realtime diarization, Speechmatics Docs. https://docs.speechmatics.com/speech-to-text/realtime/realtime-diarization
- On-Device Speech-to-Text for Laptop. https://www.speechmatics.com/speech-to-text/on-device
- Speechmatics pricing. https://www.speechmatics.com/pricing
- 99 Languages, Advanced Features, One Price. https://www.assemblyai.com/blog/99-languages
- Introducing Universal-3 Pro. https://www.assemblyai.com/blog/introducing-universal-3-pro
- AssemblyAI Pricing. https://www.assemblyai.com/pricing
- facebookresearch/seamless_communication. https://github.com/facebookresearch/seamless_communication
- seamless_communication/docs/m4t/README.md. https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/README.md
- Seamless Communication, AI at Meta. https://ai.meta.com/research/seamless-communication/
- Prem AI Brings Sovereign Voice AI to Regulated Enterprises. https://deepgram.com/customers/prem-ai
- Large Vocabulary Speech Recognition: A Practical Guide. https://deepgram.com/learn/large-vocabulary-speech-recognition