OpenTranscription/ Blog
2026-07-03 · MODEL PROFILE

Deepgram Base: model profile

Reference profile of Deepgram Base, a legacy speech-to-text model family with task-specific variants, batch and streaming APIs, and self-hosted deployment.

model-profilespeech-to-textdeepgramasrlegacy-models
Deepgram
Model profile Deepgram

Deepgram Base is a legacy speech-to-text model family built on Deepgram's end-to-end architecture, offered through the company's REST and WebSocket transcription APIs.

Specifications

DeveloperDeepgram
Model typeEnd-to-end speech-to-text model family; positioned under Legacy Models in current documentation
Languagesbase-general supports 20+ languages plus regional BCP-47 tags; specialty variants are English only
Modes (batch / streaming)Both: REST pre-recorded transcription and WebSocket live streaming
LatencyDeepgram-wide streaming guidance: 150-300 ms transcription latency, 200-500 ms total transcript latency
Throughput / concurrencyPay As You Go: 50 concurrent pre-recorded / 150 concurrent streaming; Growth: 50 / 225 in North America; Enterprise starts at 200 / 300
DeploymentDeepgram-hosted cloud API (North America and EU endpoints), self-hosted cloud or on-prem, Amazon SageMaker, Kubernetes and cloud VM patterns
PricingCurrent public self-serve Base price not listed; the current STT price table lists Flux, Nova-3, and Custom

Not disclosedReleased · Parameters · Training data · License

Full technical breakdown9 sections

Overview

Deepgram describes Base as being built on its end-to-end speech-to-text architecture and says it offers a "solid combination of accuracy and cost effectiveness in some cases." Current documentation places Base under Legacy Models, alongside older Nova and Enhanced lines, and states that Enhanced generally has higher accuracy and better uncommon-word handling than Base. Base remains present in Deepgram's 2026 documentation and rate-limit tables, and current API reference pages still default the generic model parameter to base-general.

In current official materials, Deepgram recommends Flux for real-time voice-agent use cases and Nova-3 for highest-accuracy general transcription, especially for noisy, multilingual, far-field, or crosstalk-heavy audio. Base is a family of task-oriented variants rather than one unified model: base-general (default), base-meeting, base-phonecall, base-voicemail, base-finance, base-conversationalai, and base-video.

Capabilities and features

Base variants and their documented positioning:

Base variant Official positioning Supported languages and dialect tags
base or base-general Everyday audio processing; default Base model Chinese zh, zh-CN, zh-TW; Danish da; Dutch nl; English en, en-US; French fr, fr-CA; German de; Hindi hi, hi-Latn; Indonesian id; Italian it; Japanese ja; Korean ko; Norwegian no; Polish pl; Portuguese pt, pt-BR, pt-PT; Russian ru; Spanish es, es-419, es-LATAM; Swedish sv; Tamasheq taq; Turkish tr; Ukrainian uk
base-meeting Conference-room audio with multiple speakers and one microphone English en, en-US
base-phonecall Low-bandwidth phone calls English en, en-US
base-voicemail Low-bandwidth single-speaker audio; derived from phonecall English en, en-US
base-finance Earnings-call style, multiple speakers, finance-heavy vocabulary English en, en-US
base-conversationalai Human speaking to an automated bot, IVR, assistant, kiosk English en, en-US
base-video Audio sourced from video English en, en-US

Transcript-layer features documented for Base:

Feature How it works in Deepgram docs Documented note
Punctuation and casing punctuate=true adds punctuation and capitalization Readability improvement for Base pipelines
Smart formatting smart_format=true adds richer formatting for readability, including dates/currency style transformations Deepgram's pricing page treats smart formatting as included on current STT pricing tiers
Word timestamps words array returns start and end per word Suitable for subtitles, searchable transcripts, and timeline alignment
Confidence scores Transcript-level and word-level confidence values are returned on a 0-1 scale Returned in the standard response structure
Speaker diarization diarize_model enables speaker change detection and labels words by speaker number; batch supports latest/v1/v2, streaming supports latest/v1 Streaming diarization uses older v1 while batch can use v2
Channel separation multichannel=true transcribes each channel independently Applies when audio is already channel-separated, such as stereo telephony
Profanity filtering profanity_filter=true converts recognized profanity to the nearest non-profane word or removes it Alters literal transcript content
Utterance segmentation utterances=true segments speech into semantic units Used for subtitles, agent-assist panes, and UI chunking
Redaction redact= can redact PII/PHI/PCI classes; Deepgram documents 50+ entity types Distinct from profanity filtering

The pre-recorded API returns transcript alternatives that include a transcript-level confidence number plus per-word {word, start, end, confidence} objects. Example responses include speaker, speaker_confidence, and punctuated_word fields when diarization and formatting are enabled.

Customization

Base supports classic keywords boosting/suppression. Keyterm prompting is Nova-3 only, and Deepgram's "instant self-serve customization without model retraining" messaging applies to Nova-3, not Base. Deepgram also supports account-linked custom trained models via custom_id; the model-options page says these custom models are available only to Enterprise customers.

Language support

The base-general variant supports the following languages and dialect tags: Chinese (zh, zh-CN, zh-TW), Danish (da), Dutch (nl), English (en, en-US), French (fr, fr-CA), German (de), Hindi (hi, hi-Latn), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Norwegian (no), Polish (pl), Portuguese (pt, pt-BR, pt-PT), Russian (ru), Spanish (es, es-419, es-LATAM), Swedish (sv), Tamasheq (taq), Turkish (tr), and Ukrainian (uk).

All specialty variants (base-meeting, base-phonecall, base-voicemail, base-finance, base-conversationalai, base-video) support English (en, en-US) only.

Deepgram's language documentation states that its English models are designed to handle global English accents and dialects, but transcript output is normalized to standardized American spelling.

Performance and benchmarks

Deepgram's current public documentation does not publish a current official Base-specific WER or CER figure. Current docs publish detailed accuracy claims for Nova-2 and Nova-3, but not an equivalent single-number benchmark for Base.

Source Metric Result Relevance
Deepgram streaming latency guide Typical transcription latency 150-300 ms transcription latency; 200-500 ms total transcript latency Applies to Deepgram streaming workloads generally, including the Base-family streaming endpoint
Deepgram API rate limits Base concurrency Pay As You Go: 50 concurrent pre-recorded / 150 concurrent streaming; Growth: 50 / 225 in North America; Enterprise starts at 200 / 300 Base remains a supported production model family in 2026
Deepgram batch autoscaling guidance Throughput Deepgram states its self-hosted batch engine can transcribe 1 hour of audio in under 30 seconds Platform-level throughput guidance, not a Base-only number
Deepgram 2023 Whisper benchmark WER on 254 real-world phone-call/meeting files Vendor-reported: Deepgram Enhanced 10.6% WER, Nova-2 8.4% WER; Whisper sizes ranged 13.1-15.3% WER in that study Official benchmark, not Base-specific
WhisperX paper Speedup on long-form transcription WhisperX reports a 12x transcription speedup using VAD segmentation and batched inference Competitor throughput reference for Whisper-family pipelines
2026 independent named-entity audit of speech providers High-stakes proper-noun difficulty Third-party evaluation: study included base-general and base-phonecall; across 15 ASR models the average transcription error rate on street/business names was 44% Not a vanilla WER benchmark; measures proper nouns and address-like entities in production audio

Latency and throughput

Deepgram's current guidance states that streaming transcription latency is optimized to 300 ms or less, with a typical breakdown of 150-300 ms transcription latency and 200-500 ms total transcript latency end to end depending on network, buffering, and client-side processing. Deepgram recommends sending audio in 20-100 ms chunks; larger buffers increase built-in delay, while smaller chunks increase overhead.

For pre-recorded jobs, Deepgram's quickstart documentation notes a 2 GB maximum file size and states that requests whose processing exceeds 10 minutes for Nova/Base/Enhanced can return a 504 Gateway Timeout.

Deepgram states its self-hosted batch engine can transcribe 1 hour of audio in under 30 seconds; this is platform-level guidance, not a Base-only figure.

Concurrency limits documented for Base: Pay As You Go allows 50 concurrent pre-recorded and 150 concurrent streaming requests; Growth allows 50 / 225 in North America; Enterprise starts at 200 / 300.

Deployment and integrations

API surface

Base uses the same core STT interfaces as the rest of Deepgram's classic transcription stack.

Use case Endpoint Input style Output style Notes
Batch transcription POST https://api.deepgram.com/v1/listen JSON with url, or direct file upload JSON response or async callback response Supports features like punctuation, diarization, redaction, topics, intents, and utterances
Live transcription wss://api.deepgram.com/v1/listen Continuous audio over WebSocket Incremental JSON events Supports interim results, endpointing, speech_final, UtteranceEnd, keepalive/finalize flow

Authentication supports either Authorization: Token or Authorization: Bearer . Deepgram also documents temporary API tokens with a default TTL of 30 seconds.

For pre-recorded audio, a request can send a JSON body with a remote URI or upload binary audio/video directly. Responses return transcript alternatives, overall confidence, and per-word timing/confidence data. Streaming returns a sequence of WebSocket messages including transcript updates plus SpeechStarted, UtteranceEnd, and metadata events.

Deepgram's official SDK surface includes at least JavaScript/TypeScript and Python, and the wider docs ecosystem also has examples for .NET, Go, Java, Python, and JavaScript in self-hosted/STT guides.

Deployment paths

Deployment path What the sources support
Deepgram-hosted cloud API Primary/default path; North America endpoint plus EU endpoint for residency considerations
Self-hosted in customer cloud or on-prem Deepgram documents self-hosted deployments running on customer infrastructure, cloud or on-prem
Amazon SageMaker Deepgram documents deployment via SageMaker and provides autoscaling/batch guidance
Kubernetes / cloud VM patterns Deepgram documents deployment guidance for GCP/Kubernetes-oriented environments and self-hosted patterns generally
Edge / on-device No separate public on-device Base runtime found in the reviewed sources; the closest documented private-deployment option is self-hosted infrastructure

Deepgram's self-hosted hardware guidance for STT lists a recommended baseline of 1 NVIDIA GPU with compute capability 7.0+, 16 GB VRAM, 4 CPU cores, 32 GB RAM, and 50 GB storage. The self-hosted docs also note that authentication is not built in for self-hosted deployments, so teams place Deepgram behind their own API gateway, reverse proxy, or network controls.

Cloud and pipeline integrations

Documented cloud integration patterns include S3-backed batch pipelines using presigned URLs, SageMaker for managed deployment on AWS, self-hosted cloud/on-prem for private inference, and regional endpoints for residency. Deepgram publishes migration guides from AWS Transcribe, Google Speech-to-Text, and OpenAI Whisper to Deepgram.

For real-time media pipelines, an official Twilio/Deepgram guide identifies 8 kHz raw mu-law as the telephony audio shape Twilio sends, which maps to base-phonecall. Deepgram also documents a LiveKit integration path for agent use cases.

For Azure, the reviewed materials point toward self-hosted deployment on Azure infrastructure; Deepgram's self-hosted hardware guidance includes Azure GPU-instance examples. The reviewed docs do not expose a separate Azure-managed Deepgram product surface.

Privacy and compliance

Deepgram's public pricing/security materials state that the platform is SOC 2 Type 1 and Type 2 certified, HIPAA compliant with BAAs for Enterprise customers handling ePHI, GDPR ready with an EU endpoint (api.eu.deepgram.com), and CCPA compliant.

On retention and privacy mechanics, the reviewed public docs describe data-residency options, a model-improvement opt-out (mip_opt_out), and flexible retention language in trust/security materials, but they do not present one globally applicable default-retention statement for Base.

Pricing

Pricing or size question What the official sources show
Current public self-serve Base price Not listed on the 2026 public pricing page; the current STT price table lists Flux, Nova-3 Monolingual, Nova-3 Multilingual, and Custom
Current public Base parameter count Not disclosed in the model docs reviewed
Current public plan tiers Pay As You Go, Growth, and Enterprise exist at the platform level
Historical official Base pricing evidence Deepgram's 2023 benchmark whitepaper included historical Base enterprise price bands for batch and streaming; these are historical, not current list prices

Platform-level plan details on the current public pricing page: Pay As You Go includes a $200 free credit, Growth starts at $4K+/year with pre-paid credits, and Enterprise is custom. Historically, Deepgram's 2023 benchmark whitepaper showed Base price bands materially below one dollar per hour of audio, varying by annual volume; those figures are not the current 2026 public price card.

Development and ownership

Deepgram develops and operates the Base model family as part of its speech-to-text platform. Base is built on Deepgram's end-to-end speech-to-text architecture. Release date, training data, and parameter counts are not publicly disclosed in the reviewed sources.

Release history

Not publicly disclosed. The reviewed sources establish that Base is listed under Legacy Models in Deepgram's 2026 documentation, that it remains operational in the public API surface with base-general as the default model parameter, and that Deepgram's 2023 benchmark whitepaper contained historical Base pricing; the sources do not state original release dates or version history for the Base family.

Sources

The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono