Deepgram Flux: model profile

Deepgram Flux is a real-time conversational speech recognition model for voice agents that combines streaming speech-to-text with model-native turn detection, interruption handling, and a structured turn-state machine over a WebSocket API.

Specifications

Developer	Deepgram
Released	October 1, 2025 (launch article); October 2, 2025 (developer changelog); Flux Multilingual GA April 29, 2026
Model type	Conversational speech recognition (CSR): streaming speech-to-text with integrated turn detection
Languages	English (flux-general-en); 10 languages via Flux Multilingual (flux-general-multi)
Modes (batch / streaming)	Streaming, over the /v2/listen WebSocket endpoint
Latency	Vendor-reported: sub-300 ms turn detection; Coval benchmark reported 50% lower latency to first token than Nova-3
Throughput / concurrency	Not publicly disclosed. (A launch promotion allowed free use up to 50 concurrent connections during October 2025.)
Deployment	Deepgram cloud API; self-hosted deployments; Cloudflare Workers AI
Pricing	Flux English $0.0065/min PAYG; Flux Multilingual $0.0078/min PAYG; lower Growth-tier rates

Not disclosedParameters · Training data · License

Known limitations

Flux's internal turn detection cannot be disabled. Daily/Pipecat's February 2026 benchmark excluded Flux for this reason, since that benchmark tested STT under external turn-detection control.
Developer reports in Deepgram's GitHub discussions: missed short utterances, initial clipping after reconnects, misses on the acronym "PPF" in an automotive ordering flow, short or soft utterance drops in some multilingual settings, and SDK/protocol friction around the /v2/listen requirement.
Quantitative performance claims (50% lower first-token latency than Nova-3, equivalent WER) are mediated through Deepgram's summary of Coval benchmarking; the full supporting detail is not directly exposed in the public extracts the source reviewed.
Not publicly disclosed: parameter count, training data, license, full Flux team structure, exact original conception date, Flux-specific adoption metrics (active customers, call volume, revenue, enterprise customer roster), and the first self-hosted release date.

Full technical breakdown9 sections

Overview

Deepgram describes Flux as its "first conversational speech recognition model built specifically for voice agents," a streaming model that knows "when to listen, when to think, and when to speak," combining transcription with integrated end-of-turn detection and barge-in awareness. Deepgram positions Flux as a category beyond conventional ASR, in contrast to traditional transcription models paired with external VAD and endpointing, which the company argues produce brittle timing, awkward pauses, and higher integration complexity in production voice agents. Deepgram uses the term CSR (conversational speech recognition) for this category.

In Deepgram's model overview, Flux is recommended for real-time agents, customer support bots, and interactive turn-based experiences. Nova-3 remains the company's recommendation for general transcription, meetings, multilingual noisy audio, and far-field use cases that do not require turn-aware behavior.

Flux launched on October 1, 2025. Flux Multilingual reached general availability on April 29, 2026 with support for 10 languages in a single streaming model.

Capabilities and features

Flux's core architecture is a turn-state machine rather than a plain partial/final transcript stream. Deepgram documents five event types:

Update: arrives about every 0.25 seconds of transcribed audio.
StartOfTurn: recommended by Deepgram for barge-in because it is more reliable than external VAD and always contains a non-empty transcript.
EagerEndOfTurn: optional; only appears if configured.
TurnResumed: only follows an EagerEndOfTurn.
EndOfTurn: Deepgram guarantees that the final EndOfTurn transcript matches the immediately preceding EagerEndOfTurn transcript unless a TurnResumed occurs first.

Configuration controls include eot_threshold, eager_eot_threshold, and eot_timeout_ms. A Configure control message allows keyterms and thresholds to be updated mid-stream without disconnecting; Deepgram's docs describe this as "context injection for speech recognition," applicable to dynamic task phases such as OTP collection, medical terminology, or product-specific vocabulary. A CloseStream control message forces final processing before shutdown.

Flux uses the /v2/listen WebSocket endpoint rather than the older /v1/listen, with model names flux-general-en and flux-general-multi. Deepgram recommends 80 ms audio chunks and mono audio, and supports the encodings linear16, linear32, mulaw, alaw, opus, and ogg-opus, plus containerized WAV, Ogg/Opus, and, since January 2026, WebM/Opus.

Deepgram's stated design goals for Flux: better end-of-turn accuracy than external detectors, lower conversational latency, fewer false interruptions, simpler developer integration, and a path toward richer speech-native systems. Jack Kearney's technical post describes Flux as a step toward a more integrated speech-to-speech future, and Deepgram's Coval post places Flux within a broader "Neuroplex" architecture intended to connect STT, LLMs, and TTS with shared context signals; the source characterizes Neuroplex as roadmap and research framing rather than a documented public Flux implementation spec.

Language support

Flux English uses the flux-general-en model. Flux Multilingual (flux-general-multi) supports 10 languages through a single model and connection: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.

Flux Multilingual adds language_hint biasing and per-turn language reporting in TurnInfo.languages and TurnInfo.languages_hinted. Deepgram's public claim is that this removes the need for separate language-detection services and model-routing logic.

Performance and benchmarks

Vendor-reported (via Deepgram's summary of Coval benchmarking): Flux achieved 50% lower latency to first token than Nova-3, faster and more reliable turn detection, and equivalent WER to Nova-3. Deepgram describes this as independent validation from Coval, which operates a public benchmark site; the source notes that the quantitative comparisons are mediated through Deepgram's own write-up and are not fully reproducible from the public web extracts reviewed.

Vendor-reported (Lindy Gaia case study): sub-300 ms turn detection and Nova-3-level accuracy.

Third-party evaluation: Daily/Pipecat's February 2026 STT benchmark excluded Flux because Flux's internal turn detection cannot be disabled, while the benchmark tested STT under external turn-detection control. Daily called Flux Deepgram's "flagship model" and said it should be included when evaluation does not require application-level turn control.

Developer feedback in Deepgram's GitHub discussions reported: missed short utterances and clipping after reconnects, trouble with the acronym "PPF" in an automotive ordering flow, short or soft utterance drops in some multilingual settings, and earlier SDK/protocol friction around Flux's requirement for the /v2/listen protocol.

Latency and throughput

Update events arrive about every 0.25 seconds of transcribed audio.
Vendor-reported turn detection: sub-300 ms.
Vendor-reported (Coval, via Deepgram): 50% lower latency to first token than Nova-3.
Deepgram recommends 80 ms audio chunks and mono audio for best performance.
Concurrency limits: not publicly disclosed. The "OktoberFLUX" launch promotion made Flux free during October 2025 up to 50 concurrent connections.

Deployment and integrations

Deployment options: Deepgram cloud API (/v2/listen WebSocket), self-hosted deployments, and Cloudflare Workers AI, where Cloudflare announced same-day availability at launch. The source notes that Deepgram's docs state Flux is available for self-hosted deployments, but the first self-hosted release date is not centralized in one official launch post in the reviewed corpus; later changelog entries document self-hosted metrics and fixes.

Deepgram says Flux Multilingual is supported through Twilio, Vapi, LiveKit, Pipecat, and Jambonz.

Tooling published by Deepgram: main docs, an OpenAPI/AsyncAPI mirror in deepgram-api-specs, official JavaScript/TypeScript, Python, .NET, and Go SDKs, the deepgram/recipes repo, deepgram/skills, deepgram/starter-contracts, and Flux demo repositories including deepgram-demos-flux-streaming-transcription, deepgram-demos-flux-streaming, deepgram-demos-flux-agent, and deepgram-demos-composite-flux-agent. Deepgram's docs also provide lower-level WebSocket guidance for developers building their own clients.

Within Deepgram's catalog, Flux sits below the Voice Agent API: a composable stack can use Flux with a separate LLM and TTS (Deepgram's build guides show Flux with OpenAI and Deepgram TTS), while the Voice Agent API bundles STT, LLM orchestration, and TTS with unified pricing and built-in interruption handling. Deepgram's migration guide states that Nova-3 streams transcript fragments and requires custom turn logic, while Flux emits conversation events and has a built-in turn state machine.

Comparison with other streaming STT products

The source includes the following comparison table.

Product	Core type	Native turn detection	Turn-state events	Language coverage	Self-hosted option	Public pricing signal
Deepgram Flux	Conversational STT for voice agents	Yes; configurable eot_threshold, eager_eot_threshold, eot_timeout_ms	Yes: StartOfTurn, EagerEndOfTurn, TurnResumed, EndOfTurn, Update	English or 10-language multilingual model	Yes	Flux English $0.0065/min PAYG; Flux Multilingual $0.0078/min PAYG
AssemblyAI Universal-3 Pro Streaming	Streaming STT for voice agents	Yes; low-latency turn detection and voice-agent focus	Yes, though simpler message model such as SpeechStarted and turn finalization	6 real-time languages now, more listed as coming soon	Yes, enterprise/self-hosted streaming docs	Pricing page lists Universal-3 Pro at $0.21/hr; streaming billed per open session
Speechmatics Realtime STT	Realtime STT	Yes; configurable EndOfUtterance silence trigger	Yes, but primarily end-of-utterance signaling rather than Flux-like turn lifecycle	55+ languages	Yes; cloud and on-prem/Kubernetes	Pro pricing from $0.24/hr
OpenAI Realtime API	Broader realtime speech-to-speech / transcription API	Yes; server VAD and semantic VAD depending session/model	Yes: speech started/stopped and automatic buffer commit in server VAD mode	Model-dependent; broader realtime voice stack rather than dedicated CSR STT	No self-hosted option documented	gpt-realtime-whisper $0.017/min; gpt-realtime-2 audio priced per 1M tokens

Pricing

Item	Price
Flux English	$0.0065/min PAYG
Flux Multilingual	$0.0078/min PAYG
Growth tier	Lower rates than PAYG; specific rates not stated in the source.
Launch promotion	"OktoberFLUX": free during October 2025 up to 50 concurrent connections

Cloudflare Workers AI docs and changelog confirm launch-partner availability and pricing on Workers AI; specific Workers AI rates are not stated in the source.

Development and ownership

Publicly identifiable Flux contributors:

Name	Role
Nick Kaimakis	Senior Product Manager; credited on the launch article. Twilio SIGNAL's speaker page describes him as Senior Product Manager, STT at Deepgram, leading Speech-to-Text including Flux.
Jack Kearney	Staff Research Scientist; author of the "Flux Chronicles" architecture post and the reinforcement-learning update post; coauthor of writeups on turn-detection evaluation and keyterm boosting.
Chau Luu	Senior Research Scientist; public coauthor on Flux technical work.
Federico Landini	Research Scientist; public coauthor on Flux technical work.
Julia Strout	Deep Learning Engineer; public coauthor on Flux technical work.

Deepgram does not publish a full Flux org chart or a canonical "invented by" page in the reviewed sources.

Deepgram states it spent "the last two years rethinking how transcription should work for real-time voice agents" before launch, and that the Flux state machine was designed from experience built via Deepgram's Voice Agent API, which launched in 2024 with "end-of-thought detection."

Company-level metrics from Deepgram's public materials (not Flux-specific): trusted by 200,000+ developers and 1,300+ organizations, 50,000+ years of audio processed, over 1 trillion words transcribed.

Release history

Date	Event
2024	Deepgram Voice Agent API launch with "end-of-thought detection"; cited as historical context for Flux's development.
October 1, 2025	Flux product launch article published.
October 2, 2025	Developer changelog and documentation release; Cloudflare announces same-day availability on Workers AI.
October 2025	"OktoberFLUX" promotion: Flux free during October 2025 up to 50 concurrent connections.
January 2026	WebM/Opus containerized audio support added.
February 2026	Daily/Pipecat STT benchmark published; excluded Flux because its turn detection cannot be disabled.
April 29, 2026	Flux Multilingual general availability with 10 languages in a single streaming model.

The source states that Deepgram's public materials do not expose a formal preview or beta page for Flux before October 2025, so any earlier internal or private preview dates are unspecified.

Sources

Getting Started with Flux | Deepgram's Docs. https://developers.deepgram.com/docs/flux/quickstart
Introducing Flux: Conversational Speech Recognition. https://deepgram.com/learn/introducing-flux-conversational-speech-recognition
Nick Kaimakis. https://deepgram.com/authors/nick-kaimakis
Flux Feature Overview | Deepgram's Docs. https://developers.deepgram.com/docs/flux/feature-overview
From ASR to CSR: Why Conversation Changes Everything. https://deepgram.com/learn/from-asr-to-csr-why-conversation-changes-everything
Coval validates Flux: no tradeoff between latency and interruption. https://deepgram.com/learn/coval-validates-flux-no-tradeoff-between-latency-and-interruption
Fluxing Conversational State and Speech-to-Text | Deepgram. https://deepgram.com/learn/fluxing-conversational-state-and-speech-to-text
Evaluating End-of-Turn (Turn Detection) Models. https://deepgram.com/learn/evaluating-end-of-turn-detection-models
Understanding the Flux State Machine | Deepgram's Docs. https://developers.deepgram.com/docs/flux/state
Introducing Flux Multilingual: One Conversational Speech Model. https://deepgram.com/learn/introducing-flux-multilingual
Deepgram's API Specs. https://github.com/deepgram/deepgram-api-specs
New Deepgram Flux model available on Workers AI. https://developers.cloudflare.com/changelog/post/2025-10-02-deepgram-flux/
Models & Languages Overview. https://developers.deepgram.com/docs/models-languages-overview
Introducing Deepgram's Voice Agent API. https://deepgram.com/learn/introducing-ai-voice-agent-api
Universal-3 Pro Streaming | AssemblyAI | Documentation. https://assemblyai.com/docs/streaming/universal-3-pro
Turn detection | Speechmatics Docs. https://docs.speechmatics.com/speech-to-text/realtime/turn-detection
Voice activity detection (VAD) | OpenAI API. https://developers.openai.com/api/docs/guides/realtime-vad
Smarter, Faster Calls for Every Business: Lindy Gaia Launches with Deepgram Flux. https://deepgram.com/learn/lindy-gaia-launches-with-deepgram-flux
Benchmarking STT for Voice Agents. https://www.daily.co/blog/benchmarking-stt-for-voice-agents/
Issues with Deepgram Flux Model - Missed Speech Events and Initial Clipping, Deepgram GitHub Discussion #1463. https://github.com/orgs/deepgram/discussions/1463
Deepgram Pricing. https://deepgram.com/pricing
October 2, 2025 | Deepgram's Docs. https://developers.deepgram.com/changelog/2025/10/2
Model selection | AssemblyAI | Documentation. https://assemblyai.com/docs/streaming/select-the-speech-model
Speaker Details: SIGNAL San Francisco 2026. https://signal.twilio.com/2026/speaker/2309636/nick-kaimakis