Deepgram Base: model profile

Deepgram Base is a legacy speech-to-text model family built on Deepgram's end-to-end architecture, offered through the company's REST and WebSocket transcription APIs.

Specifications

Developer	Deepgram
Model type	End-to-end speech-to-text model family; positioned under Legacy Models in current documentation
Languages	base-general supports 20+ languages plus regional BCP-47 tags; specialty variants are English only
Modes (batch / streaming)	Both: REST pre-recorded transcription and WebSocket live streaming
Latency	Deepgram-wide streaming guidance: 150-300 ms transcription latency, 200-500 ms total transcript latency
Throughput / concurrency	Pay As You Go: 50 concurrent pre-recorded / 150 concurrent streaming; Growth: 50 / 225 in North America; Enterprise starts at 200 / 300
Deployment	Deepgram-hosted cloud API (North America and EU endpoints), self-hosted cloud or on-prem, Amazon SageMaker, Kubernetes and cloud VM patterns
Pricing	Current public self-serve Base price not listed; the current STT price table lists Flux, Nova-3, and Custom

Not disclosedReleased · Parameters · Training data · License

Known limitations

No current public official Base-specific WER or CER figure appears in the reviewed sources; Deepgram publishes such figures for Nova-2 and Nova-3 but not Base.
No current public self-serve Base list price appears on the 2026 public pricing page.
Parameter counts for Base are not disclosed; the same model-options page publishes parameter counts for Whisper Cloud sizes but not for Base.
Deepgram states that Enhanced generally has higher accuracy and better uncommon-word handling than Base.
Base has weaker rare-word handling than Enhanced; a 2026 independent audit that included base-general and base-phonecall found an average transcription error rate of 44% on street/business names across 15 ASR models.
Keyterm prompting is not available for Base (Nova-3 only); custom trained models via custom_id require an Enterprise plan.
Current Deepgram product pages recommend Nova-3, not Base, for background noise, crosstalk, and far-field input; no dedicated current Base noise-robustness benchmark appears in the reviewed materials.
English transcript output is normalized to standardized American spelling regardless of the speaker's dialect.
Pre-recorded requests are limited to 2 GB per file, and processing that exceeds 10 minutes for Nova/Base/Enhanced can return a 504 Gateway Timeout.
Deepgram can close idle streams when no audio arrives; documented mitigations are sending KeepAlive during silence and starting audio within 10 seconds of connection open.
Each new streaming session starts a fresh local timeline, so timestamps after a reconnect require a client-maintained offset.
Streaming diarization supports latest/v1 while batch supports latest/v2, so streaming uses the older diarization version.
No separate public on-device Base runtime is documented; private deployment is via self-hosted infrastructure.
A globally applicable default data-retention statement for Base does not appear in the reviewed sources.

Full technical breakdown9 sections

Overview

Deepgram describes Base as being built on its end-to-end speech-to-text architecture and says it offers a "solid combination of accuracy and cost effectiveness in some cases." Current documentation places Base under Legacy Models, alongside older Nova and Enhanced lines, and states that Enhanced generally has higher accuracy and better uncommon-word handling than Base. Base remains present in Deepgram's 2026 documentation and rate-limit tables, and current API reference pages still default the generic model parameter to base-general.

In current official materials, Deepgram recommends Flux for real-time voice-agent use cases and Nova-3 for highest-accuracy general transcription, especially for noisy, multilingual, far-field, or crosstalk-heavy audio. Base is a family of task-oriented variants rather than one unified model: base-general (default), base-meeting, base-phonecall, base-voicemail, base-finance, base-conversationalai, and base-video.

Capabilities and features

Base variants and their documented positioning:

Base variant	Official positioning	Supported languages and dialect tags
base or base-general	Everyday audio processing; default Base model	Chinese zh, zh-CN, zh-TW; Danish da; Dutch nl; English en, en-US; French fr, fr-CA; German de; Hindi hi, hi-Latn; Indonesian id; Italian it; Japanese ja; Korean ko; Norwegian no; Polish pl; Portuguese pt, pt-BR, pt-PT; Russian ru; Spanish es, es-419, es-LATAM; Swedish sv; Tamasheq taq; Turkish tr; Ukrainian uk
base-meeting	Conference-room audio with multiple speakers and one microphone	English en, en-US
base-phonecall	Low-bandwidth phone calls	English en, en-US
base-voicemail	Low-bandwidth single-speaker audio; derived from phonecall	English en, en-US
base-finance	Earnings-call style, multiple speakers, finance-heavy vocabulary	English en, en-US
base-conversationalai	Human speaking to an automated bot, IVR, assistant, kiosk	English en, en-US
base-video	Audio sourced from video	English en, en-US

Transcript-layer features documented for Base:

Feature	How it works in Deepgram docs	Documented note
Punctuation and casing	punctuate=true adds punctuation and capitalization	Readability improvement for Base pipelines
Smart formatting	smart_format=true adds richer formatting for readability, including dates/currency style transformations	Deepgram's pricing page treats smart formatting as included on current STT pricing tiers
Word timestamps	words array returns start and end per word	Suitable for subtitles, searchable transcripts, and timeline alignment
Confidence scores	Transcript-level and word-level confidence values are returned on a 0-1 scale	Returned in the standard response structure
Speaker diarization	diarize_model enables speaker change detection and labels words by speaker number; batch supports latest/v1/v2, streaming supports latest/v1	Streaming diarization uses older v1 while batch can use v2
Channel separation	multichannel=true transcribes each channel independently	Applies when audio is already channel-separated, such as stereo telephony
Profanity filtering	profanity_filter=true converts recognized profanity to the nearest non-profane word or removes it	Alters literal transcript content
Utterance segmentation	utterances=true segments speech into semantic units	Used for subtitles, agent-assist panes, and UI chunking
Redaction	redact= can redact PII/PHI/PCI classes; Deepgram documents 50+ entity types	Distinct from profanity filtering

The pre-recorded API returns transcript alternatives that include a transcript-level confidence number plus per-word {word, start, end, confidence} objects. Example responses include speaker, speaker_confidence, and punctuated_word fields when diarization and formatting are enabled.

Customization

Base supports classic keywords boosting/suppression. Keyterm prompting is Nova-3 only, and Deepgram's "instant self-serve customization without model retraining" messaging applies to Nova-3, not Base. Deepgram also supports account-linked custom trained models via custom_id; the model-options page says these custom models are available only to Enterprise customers.

Language support

The base-general variant supports the following languages and dialect tags: Chinese (zh, zh-CN, zh-TW), Danish (da), Dutch (nl), English (en, en-US), French (fr, fr-CA), German (de), Hindi (hi, hi-Latn), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Norwegian (no), Polish (pl), Portuguese (pt, pt-BR, pt-PT), Russian (ru), Spanish (es, es-419, es-LATAM), Swedish (sv), Tamasheq (taq), Turkish (tr), and Ukrainian (uk).

All specialty variants (base-meeting, base-phonecall, base-voicemail, base-finance, base-conversationalai, base-video) support English (en, en-US) only.

Deepgram's language documentation states that its English models are designed to handle global English accents and dialects, but transcript output is normalized to standardized American spelling.

Performance and benchmarks

Deepgram's current public documentation does not publish a current official Base-specific WER or CER figure. Current docs publish detailed accuracy claims for Nova-2 and Nova-3, but not an equivalent single-number benchmark for Base.

Source	Metric	Result	Relevance
Deepgram streaming latency guide	Typical transcription latency	150-300 ms transcription latency; 200-500 ms total transcript latency	Applies to Deepgram streaming workloads generally, including the Base-family streaming endpoint
Deepgram API rate limits	Base concurrency	Pay As You Go: 50 concurrent pre-recorded / 150 concurrent streaming; Growth: 50 / 225 in North America; Enterprise starts at 200 / 300	Base remains a supported production model family in 2026
Deepgram batch autoscaling guidance	Throughput	Deepgram states its self-hosted batch engine can transcribe 1 hour of audio in under 30 seconds	Platform-level throughput guidance, not a Base-only number
Deepgram 2023 Whisper benchmark	WER on 254 real-world phone-call/meeting files	Vendor-reported: Deepgram Enhanced 10.6% WER, Nova-2 8.4% WER; Whisper sizes ranged 13.1-15.3% WER in that study	Official benchmark, not Base-specific
WhisperX paper	Speedup on long-form transcription	WhisperX reports a 12x transcription speedup using VAD segmentation and batched inference	Competitor throughput reference for Whisper-family pipelines
2026 independent named-entity audit of speech providers	High-stakes proper-noun difficulty	Third-party evaluation: study included base-general and base-phonecall; across 15 ASR models the average transcription error rate on street/business names was 44%	Not a vanilla WER benchmark; measures proper nouns and address-like entities in production audio

Latency and throughput

Deepgram's current guidance states that streaming transcription latency is optimized to 300 ms or less, with a typical breakdown of 150-300 ms transcription latency and 200-500 ms total transcript latency end to end depending on network, buffering, and client-side processing. Deepgram recommends sending audio in 20-100 ms chunks; larger buffers increase built-in delay, while smaller chunks increase overhead.

For pre-recorded jobs, Deepgram's quickstart documentation notes a 2 GB maximum file size and states that requests whose processing exceeds 10 minutes for Nova/Base/Enhanced can return a 504 Gateway Timeout.

Deepgram states its self-hosted batch engine can transcribe 1 hour of audio in under 30 seconds; this is platform-level guidance, not a Base-only figure.

Concurrency limits documented for Base: Pay As You Go allows 50 concurrent pre-recorded and 150 concurrent streaming requests; Growth allows 50 / 225 in North America; Enterprise starts at 200 / 300.

Deployment and integrations

API surface

Base uses the same core STT interfaces as the rest of Deepgram's classic transcription stack.

Use case	Endpoint	Input style	Output style	Notes
Batch transcription	POST https://api.deepgram.com/v1/listen	JSON with url, or direct file upload	JSON response or async callback response	Supports features like punctuation, diarization, redaction, topics, intents, and utterances
Live transcription	wss://api.deepgram.com/v1/listen	Continuous audio over WebSocket	Incremental JSON events	Supports interim results, endpointing, speech_final, UtteranceEnd, keepalive/finalize flow

Authentication supports either Authorization: Token or Authorization: Bearer . Deepgram also documents temporary API tokens with a default TTL of 30 seconds.

For pre-recorded audio, a request can send a JSON body with a remote URI or upload binary audio/video directly. Responses return transcript alternatives, overall confidence, and per-word timing/confidence data. Streaming returns a sequence of WebSocket messages including transcript updates plus SpeechStarted, UtteranceEnd, and metadata events.

Deepgram's official SDK surface includes at least JavaScript/TypeScript and Python, and the wider docs ecosystem also has examples for .NET, Go, Java, Python, and JavaScript in self-hosted/STT guides.

Deployment paths

Deployment path	What the sources support
Deepgram-hosted cloud API	Primary/default path; North America endpoint plus EU endpoint for residency considerations
Self-hosted in customer cloud or on-prem	Deepgram documents self-hosted deployments running on customer infrastructure, cloud or on-prem
Amazon SageMaker	Deepgram documents deployment via SageMaker and provides autoscaling/batch guidance
Kubernetes / cloud VM patterns	Deepgram documents deployment guidance for GCP/Kubernetes-oriented environments and self-hosted patterns generally
Edge / on-device	No separate public on-device Base runtime found in the reviewed sources; the closest documented private-deployment option is self-hosted infrastructure

Deepgram's self-hosted hardware guidance for STT lists a recommended baseline of 1 NVIDIA GPU with compute capability 7.0+, 16 GB VRAM, 4 CPU cores, 32 GB RAM, and 50 GB storage. The self-hosted docs also note that authentication is not built in for self-hosted deployments, so teams place Deepgram behind their own API gateway, reverse proxy, or network controls.

Cloud and pipeline integrations

Documented cloud integration patterns include S3-backed batch pipelines using presigned URLs, SageMaker for managed deployment on AWS, self-hosted cloud/on-prem for private inference, and regional endpoints for residency. Deepgram publishes migration guides from AWS Transcribe, Google Speech-to-Text, and OpenAI Whisper to Deepgram.

For real-time media pipelines, an official Twilio/Deepgram guide identifies 8 kHz raw mu-law as the telephony audio shape Twilio sends, which maps to base-phonecall. Deepgram also documents a LiveKit integration path for agent use cases.

For Azure, the reviewed materials point toward self-hosted deployment on Azure infrastructure; Deepgram's self-hosted hardware guidance includes Azure GPU-instance examples. The reviewed docs do not expose a separate Azure-managed Deepgram product surface.

Privacy and compliance

Deepgram's public pricing/security materials state that the platform is SOC 2 Type 1 and Type 2 certified, HIPAA compliant with BAAs for Enterprise customers handling ePHI, GDPR ready with an EU endpoint (api.eu.deepgram.com), and CCPA compliant.

On retention and privacy mechanics, the reviewed public docs describe data-residency options, a model-improvement opt-out (mip_opt_out), and flexible retention language in trust/security materials, but they do not present one globally applicable default-retention statement for Base.

Pricing

Pricing or size question	What the official sources show
Current public self-serve Base price	Not listed on the 2026 public pricing page; the current STT price table lists Flux, Nova-3 Monolingual, Nova-3 Multilingual, and Custom
Current public Base parameter count	Not disclosed in the model docs reviewed
Current public plan tiers	Pay As You Go, Growth, and Enterprise exist at the platform level
Historical official Base pricing evidence	Deepgram's 2023 benchmark whitepaper included historical Base enterprise price bands for batch and streaming; these are historical, not current list prices

Platform-level plan details on the current public pricing page: Pay As You Go includes a $200 free credit, Growth starts at $4K+/year with pre-paid credits, and Enterprise is custom. Historically, Deepgram's 2023 benchmark whitepaper showed Base price bands materially below one dollar per hour of audio, varying by annual volume; those figures are not the current 2026 public price card.

Development and ownership

Deepgram develops and operates the Base model family as part of its speech-to-text platform. Base is built on Deepgram's end-to-end speech-to-text architecture. Release date, training data, and parameter counts are not publicly disclosed in the reviewed sources.

Release history

Not publicly disclosed. The reviewed sources establish that Base is listed under Legacy Models in Deepgram's 2026 documentation, that it remains operational in the public API surface with base-general as the default model parameter, and that Deepgram's 2023 benchmark whitepaper contained historical Base pricing; the sources do not state original release dates or version history for the Base family.

Sources

Models & Languages Overview: https://developers.deepgram.com/docs/models-languages-overview
Deepgram Pricing | Scalable Speech-to-Text, Text-to-Speech & Voice Agent APIs: https://deepgram.com/pricing
Model Options | Deepgram's Docs: https://developers.deepgram.com/docs/model
Pre-Recorded Audio | Deepgram's Docs: https://developers.deepgram.com/reference/speech-to-text/listen-pre-recorded
Official JavaScript SDK for Deepgram: https://developers.deepgram.com/docs/js-sdk-v2-to-v3-migration-guide
Live Audio | Deepgram's Docs: https://developers.deepgram.com/reference/speech-to-text/listen-streaming
Getting Started with Speech to Text: https://developers.deepgram.com/docs/stt/getting-started
Measuring STT Latency | Deepgram's Docs: https://developers.deepgram.com/docs/measuring-streaming-latency
API Rate Limits | Deepgram's Docs: https://developers.deepgram.com/reference/api-rate-limits
Auto-Scaling: https://developers.deepgram.com/docs/autoscaling-best-practices
Deepgram vs Whisper Benchmark whitepaper: https://offers.deepgram.com/hubfs/Whitepaper%20Deepgram%20vs%20Whisper%20Benchmark.pdf
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio: https://arxiv.org/abs/2303.00747
"Sorry, I Didn't Catch That": How Speech Models Miss What Matters Most: https://arxiv.org/html/2602.12249v2
Getting Started | Deepgram's Docs: https://developers.deepgram.com/docs/pre-recorded-audio
Supported Entity Types: https://developers.deepgram.com/docs/supported-entity-types
Determining Your Audio Format for Live Streaming Audio: https://developers.deepgram.com/docs/determining-your-audio-format-for-live-streaming-audio
Audio Keep Alive: https://developers.deepgram.com/docs/audio-keep-alive
Recovering From Connection Errors & Timeouts When Live Streaming | Deepgram's Docs: https://developers.deepgram.com/docs/recovering-from-connection-errors-and-timeouts-when-live-streaming-audio
Endpointing | Deepgram's Docs: https://developers.deepgram.com/docs/endpointing
Introducing Whisper: https://openai.com/index/whisper/
Ingress Authentication: https://developers.deepgram.com/docs/self-hosted-ingress-auth
Amazon Web Services | Deepgram's Docs: https://developers.deepgram.com/docs/aws-docker-podman
Google Cloud Platform | Deepgram's Docs: https://developers.deepgram.com/docs/gcp-k8s
AWS S3 Presigned URLs and Deepgram: https://developers.deepgram.com/docs/using-aws-s3-presigned-urls-with-the-deepgram-api
Twilio and Deepgram Voice Agent: https://developers.deepgram.com/docs/twilio-and-deepgram-voice-agent
Robust Speech Recognition via Large-Scale Weak Supervision: https://arxiv.org/abs/2212.04356
Chirp 3 Transcription: Enhanced multilingual accuracy: https://docs.cloud.google.com/speech-to-text/docs/models/chirp-3
Amazon Transcribe - Speech to Text - AWS: https://aws.amazon.com/transcribe/