OpenTranscription/ Blog
2026-07-03 · MODEL PROFILE

Deepgram Flux: model profile

Reference spec for Deepgram Flux, a streaming conversational speech recognition model with built-in turn detection for voice agents.

Deepgram
Model profile Deepgram

Deepgram Flux is a real-time conversational speech recognition model for voice agents that combines streaming speech-to-text with model-native turn detection, interruption handling, and a structured turn-state machine over a WebSocket API.

Specifications

DeveloperDeepgram
ReleasedOctober 1, 2025 (launch article); October 2, 2025 (developer changelog); Flux Multilingual GA April 29, 2026
Model typeConversational speech recognition (CSR): streaming speech-to-text with integrated turn detection
LanguagesEnglish (flux-general-en); 10 languages via Flux Multilingual (flux-general-multi)
Modes (batch / streaming)Streaming, over the /v2/listen WebSocket endpoint
LatencyVendor-reported: sub-300 ms turn detection; Coval benchmark reported 50% lower latency to first token than Nova-3
Throughput / concurrencyNot publicly disclosed. (A launch promotion allowed free use up to 50 concurrent connections during October 2025.)
DeploymentDeepgram cloud API; self-hosted deployments; Cloudflare Workers AI
PricingFlux English $0.0065/min PAYG; Flux Multilingual $0.0078/min PAYG; lower Growth-tier rates

Not disclosedParameters · Training data · License

Full technical breakdown9 sections

Overview

Deepgram describes Flux as its "first conversational speech recognition model built specifically for voice agents," a streaming model that knows "when to listen, when to think, and when to speak," combining transcription with integrated end-of-turn detection and barge-in awareness. Deepgram positions Flux as a category beyond conventional ASR, in contrast to traditional transcription models paired with external VAD and endpointing, which the company argues produce brittle timing, awkward pauses, and higher integration complexity in production voice agents. Deepgram uses the term CSR (conversational speech recognition) for this category.

In Deepgram's model overview, Flux is recommended for real-time agents, customer support bots, and interactive turn-based experiences. Nova-3 remains the company's recommendation for general transcription, meetings, multilingual noisy audio, and far-field use cases that do not require turn-aware behavior.

Flux launched on October 1, 2025. Flux Multilingual reached general availability on April 29, 2026 with support for 10 languages in a single streaming model.

Capabilities and features

Flux's core architecture is a turn-state machine rather than a plain partial/final transcript stream. Deepgram documents five event types:

  • Update: arrives about every 0.25 seconds of transcribed audio.
  • StartOfTurn: recommended by Deepgram for barge-in because it is more reliable than external VAD and always contains a non-empty transcript.
  • EagerEndOfTurn: optional; only appears if configured.
  • TurnResumed: only follows an EagerEndOfTurn.
  • EndOfTurn: Deepgram guarantees that the final EndOfTurn transcript matches the immediately preceding EagerEndOfTurn transcript unless a TurnResumed occurs first.

Configuration controls include eot_threshold, eager_eot_threshold, and eot_timeout_ms. A Configure control message allows keyterms and thresholds to be updated mid-stream without disconnecting; Deepgram's docs describe this as "context injection for speech recognition," applicable to dynamic task phases such as OTP collection, medical terminology, or product-specific vocabulary. A CloseStream control message forces final processing before shutdown.

Flux uses the /v2/listen WebSocket endpoint rather than the older /v1/listen, with model names flux-general-en and flux-general-multi. Deepgram recommends 80 ms audio chunks and mono audio, and supports the encodings linear16, linear32, mulaw, alaw, opus, and ogg-opus, plus containerized WAV, Ogg/Opus, and, since January 2026, WebM/Opus.

Deepgram's stated design goals for Flux: better end-of-turn accuracy than external detectors, lower conversational latency, fewer false interruptions, simpler developer integration, and a path toward richer speech-native systems. Jack Kearney's technical post describes Flux as a step toward a more integrated speech-to-speech future, and Deepgram's Coval post places Flux within a broader "Neuroplex" architecture intended to connect STT, LLMs, and TTS with shared context signals; the source characterizes Neuroplex as roadmap and research framing rather than a documented public Flux implementation spec.

Language support

Flux English uses the flux-general-en model. Flux Multilingual (flux-general-multi) supports 10 languages through a single model and connection: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch.

Flux Multilingual adds language_hint biasing and per-turn language reporting in TurnInfo.languages and TurnInfo.languages_hinted. Deepgram's public claim is that this removes the need for separate language-detection services and model-routing logic.

Performance and benchmarks

Vendor-reported (via Deepgram's summary of Coval benchmarking): Flux achieved 50% lower latency to first token than Nova-3, faster and more reliable turn detection, and equivalent WER to Nova-3. Deepgram describes this as independent validation from Coval, which operates a public benchmark site; the source notes that the quantitative comparisons are mediated through Deepgram's own write-up and are not fully reproducible from the public web extracts reviewed.

Vendor-reported (Lindy Gaia case study): sub-300 ms turn detection and Nova-3-level accuracy.

Third-party evaluation: Daily/Pipecat's February 2026 STT benchmark excluded Flux because Flux's internal turn detection cannot be disabled, while the benchmark tested STT under external turn-detection control. Daily called Flux Deepgram's "flagship model" and said it should be included when evaluation does not require application-level turn control.

Developer feedback in Deepgram's GitHub discussions reported: missed short utterances and clipping after reconnects, trouble with the acronym "PPF" in an automotive ordering flow, short or soft utterance drops in some multilingual settings, and earlier SDK/protocol friction around Flux's requirement for the /v2/listen protocol.

Latency and throughput

  • Update events arrive about every 0.25 seconds of transcribed audio.
  • Vendor-reported turn detection: sub-300 ms.
  • Vendor-reported (Coval, via Deepgram): 50% lower latency to first token than Nova-3.
  • Deepgram recommends 80 ms audio chunks and mono audio for best performance.
  • Concurrency limits: not publicly disclosed. The "OktoberFLUX" launch promotion made Flux free during October 2025 up to 50 concurrent connections.

Deployment and integrations

Deployment options: Deepgram cloud API (/v2/listen WebSocket), self-hosted deployments, and Cloudflare Workers AI, where Cloudflare announced same-day availability at launch. The source notes that Deepgram's docs state Flux is available for self-hosted deployments, but the first self-hosted release date is not centralized in one official launch post in the reviewed corpus; later changelog entries document self-hosted metrics and fixes.

Deepgram says Flux Multilingual is supported through Twilio, Vapi, LiveKit, Pipecat, and Jambonz.

Tooling published by Deepgram: main docs, an OpenAPI/AsyncAPI mirror in deepgram-api-specs, official JavaScript/TypeScript, Python, .NET, and Go SDKs, the deepgram/recipes repo, deepgram/skills, deepgram/starter-contracts, and Flux demo repositories including deepgram-demos-flux-streaming-transcription, deepgram-demos-flux-streaming, deepgram-demos-flux-agent, and deepgram-demos-composite-flux-agent. Deepgram's docs also provide lower-level WebSocket guidance for developers building their own clients.

Within Deepgram's catalog, Flux sits below the Voice Agent API: a composable stack can use Flux with a separate LLM and TTS (Deepgram's build guides show Flux with OpenAI and Deepgram TTS), while the Voice Agent API bundles STT, LLM orchestration, and TTS with unified pricing and built-in interruption handling. Deepgram's migration guide states that Nova-3 streams transcript fragments and requires custom turn logic, while Flux emits conversation events and has a built-in turn state machine.

Comparison with other streaming STT products

The source includes the following comparison table.

Product Core type Native turn detection Turn-state events Language coverage Self-hosted option Public pricing signal
Deepgram Flux Conversational STT for voice agents Yes; configurable eot_threshold, eager_eot_threshold, eot_timeout_ms Yes: StartOfTurn, EagerEndOfTurn, TurnResumed, EndOfTurn, Update English or 10-language multilingual model Yes Flux English $0.0065/min PAYG; Flux Multilingual $0.0078/min PAYG
AssemblyAI Universal-3 Pro Streaming Streaming STT for voice agents Yes; low-latency turn detection and voice-agent focus Yes, though simpler message model such as SpeechStarted and turn finalization 6 real-time languages now, more listed as coming soon Yes, enterprise/self-hosted streaming docs Pricing page lists Universal-3 Pro at $0.21/hr; streaming billed per open session
Speechmatics Realtime STT Realtime STT Yes; configurable EndOfUtterance silence trigger Yes, but primarily end-of-utterance signaling rather than Flux-like turn lifecycle 55+ languages Yes; cloud and on-prem/Kubernetes Pro pricing from $0.24/hr
OpenAI Realtime API Broader realtime speech-to-speech / transcription API Yes; server VAD and semantic VAD depending session/model Yes: speech started/stopped and automatic buffer commit in server VAD mode Model-dependent; broader realtime voice stack rather than dedicated CSR STT No self-hosted option documented gpt-realtime-whisper $0.017/min; gpt-realtime-2 audio priced per 1M tokens

Pricing

Item Price
Flux English $0.0065/min PAYG
Flux Multilingual $0.0078/min PAYG
Growth tier Lower rates than PAYG; specific rates not stated in the source.
Launch promotion "OktoberFLUX": free during October 2025 up to 50 concurrent connections

Cloudflare Workers AI docs and changelog confirm launch-partner availability and pricing on Workers AI; specific Workers AI rates are not stated in the source.

Development and ownership

Publicly identifiable Flux contributors:

Name Role
Nick Kaimakis Senior Product Manager; credited on the launch article. Twilio SIGNAL's speaker page describes him as Senior Product Manager, STT at Deepgram, leading Speech-to-Text including Flux.
Jack Kearney Staff Research Scientist; author of the "Flux Chronicles" architecture post and the reinforcement-learning update post; coauthor of writeups on turn-detection evaluation and keyterm boosting.
Chau Luu Senior Research Scientist; public coauthor on Flux technical work.
Federico Landini Research Scientist; public coauthor on Flux technical work.
Julia Strout Deep Learning Engineer; public coauthor on Flux technical work.

Deepgram does not publish a full Flux org chart or a canonical "invented by" page in the reviewed sources.

Deepgram states it spent "the last two years rethinking how transcription should work for real-time voice agents" before launch, and that the Flux state machine was designed from experience built via Deepgram's Voice Agent API, which launched in 2024 with "end-of-thought detection."

Company-level metrics from Deepgram's public materials (not Flux-specific): trusted by 200,000+ developers and 1,300+ organizations, 50,000+ years of audio processed, over 1 trillion words transcribed.

Release history

Date Event
2024 Deepgram Voice Agent API launch with "end-of-thought detection"; cited as historical context for Flux's development.
October 1, 2025 Flux product launch article published.
October 2, 2025 Developer changelog and documentation release; Cloudflare announces same-day availability on Workers AI.
October 2025 "OktoberFLUX" promotion: Flux free during October 2025 up to 50 concurrent connections.
January 2026 WebM/Opus containerized audio support added.
February 2026 Daily/Pipecat STT benchmark published; excluded Flux because its turn detection cannot be disabled.
April 29, 2026 Flux Multilingual general availability with 10 languages in a single streaming model.

The source states that Deepgram's public materials do not expose a formal preview or beta page for Flux before October 2025, so any earlier internal or private preview dates are unspecified.

Sources

The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono