Scribe v2 Realtime: model profile

Scribe v2 Realtime is ElevenLabs' streaming speech-to-text model for live transcription, released on November 11, 2025 and delivered through the ElevenLabs API, SDKs, and Agents platform.

Specifications

Developer	ElevenLabs
Released	November 11, 2025
Model type	Streaming speech-to-text with predictive transcription; architecture not publicly described
Languages	90+ for the realtime model; broader Scribe family marketed at 99
Modes (batch / streaming)	Streaming; batch transcription is handled by the separate Scribe v2 model
Latency	Vendor-reported: under 150 ms; one realtime product page states under 100 ms
Throughput / concurrency	30+ concurrent sessions for enterprise clients; general self-serve concurrency policy not published
Deployment	Cloud: WebSocket API, SDKs, JavaScript and React clients, ElevenAgents; on-prem early-access materials do not explicitly name this model
Pricing	$0.39/hour pay-as-you-go; $0.28/hour and lower on annual Business plans; keyterm prompting adds 20%

Not disclosedParameters · Training data · License

Known limitations

No published parameter count, training-hours figure, decoder type, backbone family, or architecture diagram for Scribe v2 Realtime. By contrast, OpenAI's Whisper page describes an encoder-decoder Transformer operating on 30-second chunks and log-Mel spectrograms; ElevenLabs has not released a comparable architectural description in the reviewed sources.
The vendor benchmark against Gemini Flash 2.5, GPT-4o Mini, and Deepgram Nova 3 ("500 hard samples") does not publish enough methodology detail to be independently reproducible.
The strongest vendor-neutral accuracy data (Artificial Analysis, 2.2% AA-WER) applies to the batch, non-streaming Scribe v2, not specifically to Scribe v2 Realtime.
Realtime diarization is described in an ElevenLabs FAQ as not a priority at the moment; dual-channel support is not planned.
Enterprise concurrency (30+ sessions) is public, but a general self-serve concurrency policy is not.
No dedicated Scribe v2 Realtime contributor roster has been published at the level of detail given for the original Scribe launch.
No single public apples-to-apples benchmark covers Scribe v2 Realtime, Google Chirp 3, Azure Speech, Deepgram Nova-3/Flux, AssemblyAI U3 Pro Streaming, OpenAI gpt-realtime-whisper, and Rev AI streaming on the same realtime dataset with the same methodology.
Some ElevenLabs public pages are inconsistent: realtime language claims are usually 90+ while broader Scribe pages sometimes say 99; pricing is $0.39/hour on the pricing page but rounded to $0.40/hour on a marketing page; one realtime product page mentions under 100 ms while the launch and docs narrative standardizes on under 150 ms.
On-prem / on-device deployment materials do not explicitly name Scribe v2 Realtime; public evidence supports cloud deployment for this model.

Full technical breakdown9 sections

Overview

Scribe v2 Realtime is the realtime member of the Scribe model family. ElevenLabs' model catalog distinguishes Scribe v2 for batch transcription and Scribe v2 Realtime for live use, and describes the latter as its "fastest and most accurate live speech recognition model," built for conversational settings such as live meeting transcription, AI agents, and multilingual recognition. The realtime WebSocket API streams partial transcripts first and committed transcripts when a segment is finalized.

ElevenLabs positions the model for voice agents, meeting assistants, live captioning, and other low-latency speech interfaces. Its central public claims are latency under 150 ms, 93.5% accuracy across 30 common European and Asian languages, and support for 90+ languages.

ElevenLabs discloses system-level behaviors rather than a full neural architecture. Public materials describe a streaming-first architecture, predictive transcription, text conditioning, manual or VAD commit strategies, and word-level timestamps, but they do not disclose the backbone, parameter count, training corpus size, or decoder class for Scribe v2 Realtime.

Capabilities and features

Scribe v2 Realtime is a cloud streaming speech-to-text service exposed primarily as a WebSocket API. Audio chunks are sent as input_audio_chunk messages, and the service returns partial and committed transcripts, including timestamped variants. Authentication uses an API key or a single-use token; the documented client-side path recommends generating the token server-side so browser clients do not expose permanent credentials. ElevenLabs provides first-party JavaScript and React support, including Scribe.connect() in @elevenlabs/client and the useScribe hook in @elevenlabs/react.

Documented system behaviors:

Streaming-first architecture and predictive transcription that anticipates likely next words and punctuation, which is how ElevenLabs explains the latency claim.
Text conditioning, allowing the model to continue transcription from previous context after a reconnect.
Two transcript finalization modes: manual commit and Voice Activity Detection. This separates fast partial text from committed text.
Word-level timestamps.

Later client and API additions include keyterms and no_verbatim support, context deduplication, microphone device options, and native mute/unmute support in the client packages.

Use cases named in ElevenLabs materials: voice agents, meeting assistants, real-time captioning, multilingual live transcription, meeting note-taking, and live language translation. The March 2026 technical explainer includes a realtime translator demo built with Scribe v2 Realtime plus the Chrome Translator API.

An ElevenLabs FAQ states that realtime diarization is not a priority at the moment and that dual-channel support is not planned.

Language support

ElevenLabs' realtime pages state 90+ languages for Scribe v2 Realtime. Broader Scribe product pages market 99 languages for the Scribe family; the 99 figure refers to the wider Scribe brand or batch model rather than the realtime model specifically.

Performance and benchmarks

Vendor-reported: the launch post claims 93.5% accuracy across 30 commonly used European and Asian languages.

Vendor benchmark: realtime marketing pages depict Scribe v2 Realtime outperforming Gemini Flash 2.5, GPT-4o Mini, and Deepgram Nova 3 on a benchmark involving "500 hard samples." The published material does not include enough methodology detail to make that chart independently reproducible.

Third-party evaluation: Artificial Analysis' non-streaming benchmark places Scribe v2 at 2.2% AA-WER, ahead of GPT-4o Transcribe (4.0%), GPT-4o Mini Transcribe (4.5%), Deepgram Nova-3 (5.2%), and Rev AI (5.9%). This result applies to the batch, non-streaming Scribe v2 model, not specifically to Scribe v2 Realtime.

The source's cross-vendor comparison, separating public latency claims, accuracy signals, and pricing:

Provider / model	Public latency	Public accuracy signal	Language support	Real-time capability	Public pricing	Notable strengths	Notable weaknesses
ElevenLabs Scribe v2 Realtime	<150 ms	93.5% accuracy across 30 common European and Asian languages	90+ languages	Yes, WebSocket streaming, partial + committed transcripts	$0.39/hr PAYG; lower on annual Business; keyterms +20%	Low latency claim; multilingual; ElevenAgents/TTS integration; documented privacy controls	No public full architecture; publicly limited realtime diarization/dual-channel story; enterprise concurrency only partially disclosed
Google Cloud Speech-to-Text Chirp 3	Streaming supported; no single ms figure in reviewed docs	Google says Chirp 3 improves accuracy and speed; no headline public WER in reviewed docs	Official Chirp 3 page lists 111 transcription locales / language codes across GA + Preview	Yes, StreamingRecognize supported in STT v2	$0.016/min starting tier ($0.96/hr)	Broad locale coverage; GCP-native; diarization, auto language detection, speech adaptation	Public docs reviewed do not provide a simple apples-to-apples WER or latency figure
OpenAI gpt-realtime-whisper / whisper-1	Low-latency realtime path with tunable delay; no fixed ms figure published in reviewed docs	No single public WER on reviewed OpenAI realtime docs; Whisper trained on 680k hours; standard transcription docs list 57 supported languages and note Whisper was trained on 98	57 listed in standard transcription docs; Whisper trained on 98 languages	Yes for gpt-realtime-whisper; whisper-1 is not natively streaming in the same way	$0.017/min realtime ($1.02/hr); standard gpt-4o-mini-transcribe is $0.003/min but not the realtime path	OpenAI ecosystem fit; tunable latency/accuracy tradeoff	No public fixed ms headline; realtime prompt steering limitations; public accuracy evidence less standardized in official docs
Microsoft Azure Speech	"Instant transcription with intermediate results"; no reviewed public ms figure	No headline public WER; Azure emphasizes customization and custom-speech optimization	140+ languages and dialects	Yes, real-time, batch, and fast transcription	Search snippet shows $1/hr standard realtime, $0.18/hr batch, $1.20/hr custom realtime	Broad language coverage; enterprise stack; fine-tuning/custom speech	Public pricing page can be opaque by region/UI; no simple public ms/WER headline in reviewed sources
Deepgram Nova-3	Sub-300 ms streaming	Deepgram says 54.2% WER reduction for streaming vs competitors; Artificial Analysis shows 5.2% AA-WER for Nova-3 (non-streaming benchmark)	45+ languages on Nova models	Yes, streaming	$0.0077/min monolingual streaming ($0.462/hr); $0.0092/min multilingual streaming ($0.552/hr)	Mature streaming stack; multilingual and noisy-audio positioning; keyword prompting and diarization ecosystem	Language breadth lower than ElevenLabs/Google/Azure; flagship multilingual streaming is pricier than monolingual
AssemblyAI Universal-3 Pro Streaming	~300 ms P50 / sub-300 ms	Vendor says best-in-class / most accurate streaming model; no single official WER figure in reviewed sources	6 languages on flagship U3 Pro Streaming; 99 on Universal-2 async	Yes, secure WebSocket streaming	Official AssemblyAI materials put U3 Pro Streaming at $0.45/hr; lower-cost universal streaming at $0.15/hr	Streaming ergonomics; no hard caps on concurrent streams; voice-agent fit	Flagship streaming language set is much narrower than ElevenLabs' 90+ claim
Rev AI	Real-time streaming with low latency; no reviewed public ms figure	Rev markets high accuracy in noisy/far-field/telephony and cites "up to 77.4% gains" in challenging conditions; Artificial Analysis shows 5.9% AA-WER	58+ async languages; 9+ streaming languages	Yes, realtime streaming + async	$0.20/hr English Reverb, $0.10/hr Reverb Turbo, $0.30/hr foreign language	Simple pricing; inexpensive; broad async availability	Streaming language breadth is much narrower; public latency disclosure is light

Latency and throughput

Vendor-reported latency is under 150 ms; one realtime product page mentions under 100 ms, while the core launch and documentation narrative standardizes on under 150 ms.

For comparison, official public figures cited in the source are sub-300 ms for Deepgram streaming and about 300 ms P50 for AssemblyAI streaming. Google, Azure, OpenAI, and Rev support live or low-latency transcription but do not publish a single comparably explicit millisecond headline for their core STT offerings in the reviewed sources.

The only explicit realtime concurrency figure in the reviewed materials is an FAQ stating 30+ concurrent sessions for enterprise clients. A general self-serve concurrency policy is not published.

Deployment and integrations

The generally available offering is cloud based: the ElevenLabs API, SDKs, JavaScript and React clients, and the Agents platform. ElevenLabs' broader speech-to-text marketing states that Scribe supports cloud and on-premise configurations, and the company has an early-access on-prem / on-device deployment program for selected models, but the on-prem materials do not explicitly name Scribe v2 Realtime.

By June 2026, ElevenAgents had changed its default ASR provider from elevenlabs to scribe_realtime.

Privacy and security disclosures: data is encrypted in transit and at rest; ElevenLabs supports SOC 2, GDPR, and HIPAA BAA for qualifying enterprises, and offers EU, India, and Singapore data residency. Zero Retention Mode is exposed for Speech-to-Text by setting enable_logging=false on /v1/speech-to-text/* endpoints, which prevents request history from appearing and limits logging for sensitive workloads.

Pricing

Pay-as-you-go: $0.39/hour for Speech to Text, with 2.5 hours included on the free/pay-as-you-go tier.
Annual Business plans: $0.28/hour and lower, per the realtime product page.
Keyterm prompting carries a 20% premium.
One speech-to-text marketing page rounds the base price to $0.40/hour; the pricing page lists $0.39/hour.

Per the source's comparison of public rates, this pricing is below Google Cloud STT v2 entry pricing ($0.96/hr), Azure standard realtime transcription ($1/hr), OpenAI's realtime whisper model ($1.02/hr), and AssemblyAI U3 Pro Streaming ($0.45/hr); it is below Deepgram's Nova-3 multilingual streaming rate ($0.552/hr) and above Rev AI's Reverb headline pricing ($0.20/hr English).

Development and ownership

Scribe v2 Realtime is developed by ElevenLabs. The company has not published a Scribe v2 Realtime-specific contributor roster comparable to the original Scribe announcement, so public attribution comes in layers.

The original Scribe launch names the core contributors to the underlying speech-to-text program: Flavio Schneider (research lead, training and architecture), Tim von Känel (project lead, pre-training and fine-tuning data), Maximiliano Levi (inference and optimizations), Johan Nordberg and Piotr Dabkowski (research contributors), Austin Malerba (frontend), Hristo Stoychev (backend), and Alex George (data acquisition). ElevenLabs author pages identify Flavio Schneider and Tim von Känel as members of the research team focused on ASR and music.

For Scribe v2 Realtime specifically, Tadas Petra authored the official technical deep-dive "How Scribe v2 Realtime Works" in March 2026. ElevenLabs does not publish a separate role label for him on its author page.

In SDK work, ElevenLabs' Python SDK release v2.46.0 credits @kraenhansen for adding keyterms and no_verbatim support to the Scribe realtime API; Kræn Hansen's GitHub profile describes his work as "Building Developer Experiences @elevenlabs."

Public attribution layer	Named people / group	Publicly stated or inferred role
Core Scribe research foundation	Flavio Schneider	Research lead; training and architecture
Core Scribe research foundation	Tim von Känel	Project lead; pre-training and fine-tuning data
Core Scribe research foundation	Maximiliano Levi	Inference and optimizations
Core Scribe research foundation	Johan Nordberg, Piotr Dabkowski	Research contributors
Core Scribe engineering	Austin Malerba, Hristo Stoychev, Alex George	Frontend, backend, data acquisition
Realtime technical rollout	Tadas Petra	Author of official Scribe v2 Realtime technical guide
SDK/productization	Kræn Hansen	Realtime SDK contributor; developer experience
Publicly visible teams	Research, ElevenAPI/developer platform, ElevenAgents	Inference from docs/blog/changelog ownership and integration

The source describes the team decomposition (Research, ElevenAPI/developer platform, ElevenAgents) as an inference from public materials rather than a published org chart.

Release history

The original Scribe launched in February 2025 with multilingual batch transcription, word-level timestamps, diarization, and audio-event tagging, and explicitly previewed a future low-latency version. In April 2025, ElevenLabs shipped scribe_v1_experimental. Scribe v2 Realtime was released in November 2025, followed by the batch Scribe v2 in January 2026. In June 2026, ElevenLabs formally deprecated scribe_v1 with a July 9, 2026 removal date, and ElevenAgents made scribe_realtime the default ASR provider.

Date	Milestone	Why it matters
Feb 26, 2025	Original Scribe launched	First STT model; realtime version promised
Apr 7, 2025	scribe_v1_experimental preview	Improved multilingual files, silence handling, audio tags
Nov 11, 2025	Scribe v2 Realtime released	Official release date for the live model
Jan 9, 2026	Scribe v2 released	Batch/long-form v2 arrives after realtime v2
Jan 19, 2026	SDK improvements around useScribe	First visible post-launch package hardening
Mar 4, 2026	"How Scribe v2 Realtime Works" published	Public technical explanation
Apr-May 2026	keyterms, no_verbatim, context, mute/unmute added	Realtime usability and control improved
Jun 8, 2026	scribe_v1 deprecated; scribe_realtime default in ElevenAgents	Realtime becomes the default ASR direction inside agents

Sources

Introducing Scribe v2 Realtime - https://elevenlabs.io/blog/introducing-scribe-v2-realtime
ElevenLabs - https://elevenlabs.io/realtime-speech-to-text
Speech to Text - Most Accurate Speech to Text Model - https://elevenlabs.io/speech-to-text
Models | ElevenLabs Documentation - https://elevenlabs.io/docs/overview/models
ElevenLabs - Meet Scribe the world's most accurate ASR model - https://elevenlabs.io/blog/meet-scribe
ElevenAPI Pricing for creators and businesses of all sizes - https://elevenlabs.io/pricing/api
How Scribe v2 Realtime Works - https://elevenlabs.io/blog/how-scribe-v2-realtime-works
April 7, 2025 | ElevenLabs Documentation - https://elevenlabs.io/docs/changelog/2025/4/7
Introducing Scribe v2 - https://elevenlabs.io/blog/introducing-scribe-v2
January 19, 2026 | ElevenLabs Documentation - https://elevenlabs.io/docs/changelog/2026/1/19
Changelog | ElevenLabs Documentation - https://elevenlabs.io/docs/changelog
June 8, 2026 | ElevenLabs Documentation - https://elevenlabs.io/docs/changelog/2026/6/8
Releases · elevenlabs/elevenlabs-python · GitHub - https://github.com/elevenlabs/elevenlabs-python/releases
Realtime | ElevenLabs Documentation - https://elevenlabs.io/docs/api-reference/speech-to-text/v-1-speech-to-text-realtime
Introducing Whisper | OpenAI - https://openai.com/index/whisper/
ElevenAPI - ElevenLabs AI audio APIs - https://elevenlabs.io/api
Chirp 3 Transcription: Enhanced multilingual accuracy | Cloud Speech-to-Text | Google Cloud Documentation - https://docs.cloud.google.com/speech-to-text/docs/models/chirp-3
Realtime transcription | OpenAI API - https://developers.openai.com/api/docs/guides/realtime-transcription
Speech to Text Overview - Speech Service - Foundry Tools - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
Measuring STT Latency | Deepgram's Docs - https://developers.deepgram.com/docs/measuring-streaming-latency
Realtime Speech-to-Text API | AssemblyAI - https://www.assemblyai.com/products/streaming-speech-to-text
Speech-to-Text API At Scale - https://www.rev.ai/speech-to-text
ElevenLabs: API Provider Benchmarking & Analysis - https://artificialanalysis.ai/speech-to-text/models/elevenlabs
Models & Languages Overview | Deepgram's Docs - https://developers.deepgram.com/docs/models-languages-overview