OpenTranscription/ Blog
2026-07-03 · MODEL PROFILE

AssemblyAI Universal-3 Pro: model profile

Reference profile of AssemblyAI Universal-3 Pro: release date, prompting model, language support, pricing, deployment, benchmarks, and disclosed limits.

AssemblyAI
Model profile AssemblyAI

Universal-3 Pro is AssemblyAI's promptable speech-to-text model for pre-recorded audio, released on February 3, 2026. AssemblyAI describes it as a "SpeechLLM" and positions it as the company's most capable model for entity-rich and domain-specific transcription.

Specifications

DeveloperAssemblyAI
ReleasedFebruary 3, 2026
Model typePromptable speech-to-text model, described by AssemblyAI as a SpeechLLM
Training dataNot publicly disclosed for Universal-3 Pro
Native languagesEnglish, Spanish, Portuguese, French, German, Italian
Extended language coverage99-language workflow through fallback to Universal-2
ModesPre-recorded cloud API; related Universal-3 Pro Streaming model for real-time workflows
DeploymentAssemblyAI cloud API, EU endpoint, and self-hosted streaming for the streaming model
PricingUniversal-3 Pro async: $0.21/hour; Universal-3 Pro Streaming: $0.45/hour base rate
Key add-onsPrompting, keyterms, speaker diarization, Medical Mode, PII redaction
LicenseProprietary API service

Not disclosedParameters

Full technical breakdown9 sections

Overview

Universal-3 Pro is built around a control surface rather than a disclosed new architecture. The public docs emphasize prompts, keyterms, audio tags, disfluency handling, speaker cues, and code-switching hints. AssemblyAI markets this as a way to improve transcription before post-processing, especially for voice agents, healthcare, customer conversations, and other workflows where rare terms and identifiers matter.

AssemblyAI has not published a Universal-3 Pro model card, architecture note, parameter count, tokenizer description, training-corpus size, data-source mix, optimizer setup, or fine-tuning recipe. The company has published detailed architecture information for Universal-1, but the public sources do not show whether Universal-3 Pro uses the same architecture.

Universal-3 Pro has six native languages in the pre-recorded API: English, Spanish, Portuguese, French, German, and Italian. For 99-language pre-recorded coverage, AssemblyAI recommends routing with speech_models: ["universal-3-pro", "universal-2"], which tries Universal-3 Pro first where supported and falls back to Universal-2 elsewhere.

Capabilities and features

  • Natural-language prompt guidance for pre-recorded transcription. In async mode, the request can use either prompt or keyterms_prompt, but not both in the same request.
  • Keyterms prompting for up to 1,000 words or phrases, with a maximum of 6 words per phrase. AssemblyAI says effective prompting can improve domain-specific term accuracy by up to 45%.
  • Audio event tagging based on more than 50 audio-event tags, with prompting support for domain-specific tags.
  • Code-switching support in the native language set, with fallback routing to Universal-2 for broader language coverage.
  • Disfluency and verbatim controls, including guidance for filler words and non-speech tags.
  • Medical Mode through domain="medical-v1", documented for medications, procedures, conditions, dosages, and other clinical vocabulary. Medical Mode supports English, Spanish, German, and French. Unsupported languages skip the add-on and are not charged.
  • Streaming counterpart: Universal-3 Pro Streaming uses the u3-rt-pro speech model, supports prompt plus keyterms together, and allows mid-stream configuration updates.

Language support

Universal-3 Pro's native pre-recorded coverage is six languages: English, Spanish, Portuguese, French, German, and Italian. AssemblyAI's 99-language story for pre-recorded transcription uses model routing, not native Universal-3 Pro coverage across all 99 languages.

The distinction matters for evaluation. A benchmark or production result on an unsupported language may exercise Universal-2 fallback rather than Universal-3 Pro itself. AssemblyAI's docs make the fallback strategy explicit, so buyers should test the actual speech_models configuration they plan to deploy.

Performance and benchmarks

AssemblyAI's pre-recorded benchmark docs report Universal-3 Pro at a mean English WER of 5.6% and a median English WER of 4.9%, compared with Universal-2 at 6.1% mean and 6.5% median. On FLEURS multilingual benchmarks, AssemblyAI reports an average WER of 4.58% for Universal-3 Pro and 7.42% for Universal-2.

AssemblyAI's benchmark site reports Universal-3 Pro at 8.23% global multilingual WER, close to Speechmatics Enhanced at 8.22%, and ahead of OpenAI GPT-4o Transcribe at 9.52%, OpenAI Whisper-1 at 14.39%, and Deepgram Nova-3 at 15.71% on that suite. The same benchmark site shows Universal-3 Pro leading the displayed code-switching and diarization comparisons.

For streaming, AssemblyAI reports Universal-3 Pro Streaming with 5.53% average WER and a 10.46% streaming medical missed-entity rate in the displayed comparisons. The Pipecat-linked benchmark view shows a median TTCT of 335 ms, which is faster than some competitors but slower than Deepgram Nova-3's 247 ms result in that benchmark.

Independent evidence is thinner. The open Pipecat STT Benchmark is the main public framework cited in the source article. A separate Voice of India preprint reports severe failures for "AssemblyAI Universal" on some Indian-language cases, but the article treats that as evidence about the broader Universal stack and fallback behavior rather than a clean Universal-3 Pro result on its native six languages.

Latency and throughput

The reviewed sources give several latency figures with different measurement boundaries. AssemblyAI docs describe Universal-3 Pro Streaming as sub-300 ms for time-to-complete transcript latency. Other tutorial materials cite figures around 150 ms or 307 ms, and the Pipecat benchmark page shows 335 ms median TTCT and 534 ms P95. These numbers should not be treated as interchangeable. Teams should test first partial latency, turn completion latency, and total response latency separately.

For pre-recorded jobs, free accounts get 5 parallel transcriptions and paid accounts start at 200+ parallel transcriptions, with higher limits available. For streaming, paid accounts start at 100+ new sessions per minute, and AssemblyAI documents automatic scale-up behavior. Each self-hosted streaming instance supports up to 48 concurrent streams without runtime degradation, according to AssemblyAI.

Deployment and integrations

Universal-3 Pro is available through AssemblyAI's cloud API for pre-recorded audio. AssemblyAI also offers an EU endpoint for data residency.

Universal-3 Pro Streaming is available through the streaming API and self-hosted streaming. The self-hosted docs describe containerized deployment inside customer infrastructure, with audio, transcripts, and PII remaining inside the customer environment. The reviewed sources do not document a self-hosted async Universal-3 Pro product.

A Cloudflare AI catalog entry for assemblyai/universal-3-pro suggests partner-hosted availability, but the reviewed sources do not show an official AssemblyAI on-device or edge deployment program.

Security and compliance

AssemblyAI states that it offers HIPAA BAA support, incorporates a DPA into customer terms, supports EU data residency, and holds SOC 2 Type 2 and ISO 27001 certifications. Product and security materials also cite PCI DSS v4.0. AssemblyAI documents encryption in transit and at rest, deletion APIs, and retention controls.

AssemblyAI says certain submitted files may be used for model training after PII redaction, where permitted by contract. Files are not used for training if the customer is under a BAA, uses EU servers, or opts out. For streaming customers who opt out of model training, AssemblyAI describes zero retention of audio and transcripts in the streaming production environment, apart from limited metadata for logging and billing.

Pricing

Item Public price
Universal-3 Pro async $0.21/hour
Universal-2 $0.15/hour
Prompting add-on $0.05/hour, listed as beta in the reviewed source
Keyterms Prompting $0.05/hour
Speaker Diarization $0.02/hour
Medical Mode $0.15/hour
Universal-3 Pro Streaming $0.45/hour base rate; keyterms included, Streaming Diarization +$0.12/hour, Prompting beta +$0.05/hour
Voice Agent API $4.50/hour

AssemblyAI states that standard usage requires no commitments.

Development and ownership

Universal-3 Pro is developed and operated by AssemblyAI. The public sources identify AssemblyAI as a research-oriented company led by founder and CEO Dylan Fox. The Universal-3 Pro launch and enablement materials were authored by Madison Bernstein, Ryan Seams, Martin Schweiger, and Kelsey Foster.

The closest detailed research lineage in the public record is Universal-1. AssemblyAI's Universal-1 paper describes a 600M-parameter Conformer RNN-T model pretrained with BEST-RQ on 12.5 million hours of unlabeled multilingual audio, then fine-tuned with supervised and pseudo-labeled data. Public sources do not verify whether the Universal-3 Pro team reused that architecture.

Release history

Date Milestone Notes
December 2023 AssemblyAI Series C AssemblyAI said the funding would support work on "superhuman" Speech AI models
February 3, 2026 Universal-3 Pro launch Promptable pre-recorded speech model released
March 2026 Universal-3 Pro Streaming Streaming counterpart released for real-time workflows
2026 Medical and self-hosted expansion Medical Mode and self-hosted streaming are documented in the current product surface

Sources

  1. Introducing Universal-3 Pro: A new class of speech language model optimized for Voice AI. https://www.assemblyai.com/blog/introducing-universal-3-pro
  2. AssemblyAI pre-recorded audio benchmarks. https://www.assemblyai.com/docs/pre-recorded-audio/benchmarks
  3. AssemblyAI FAQ: Can you sign a BAA. https://www.assemblyai.com/docs/faq/can-you-sign-a-baa
  4. Universal-3 Pro async docs. https://www.assemblyai.com/docs/pre-recorded-audio/universal-3-pro
  5. AssemblyAI pricing. https://www.assemblyai.com/pricing
  6. Self-hosted streaming docs. https://www.assemblyai.com/docs/streaming/self-hosted-streaming
  7. Universal-1 research. https://www.assemblyai.com/research/universal-1
  8. Pipecat STT Benchmark. https://github.com/pipecat-ai/stt-benchmark
  9. Voice of India preprint. https://arxiv.org/pdf/2604.19151
  10. Medical Mode docs. https://www.assemblyai.com/docs/pre-recorded-audio/medical-mode
  11. AssemblyAI benchmark site. https://www.assemblyai.com/benchmarks
  12. Expanding enterprise security and data residency capabilities. https://www.assemblyai.com/blog/expanding-enterprise-security-and-data-residency-capabilities
  13. Universal-3 Pro streaming docs. https://assemblyai.com/docs/streaming/universal-3-pro
  14. Cloudflare AI catalog entry for assemblyai/universal-3-pro. https://developers.cloudflare.com/ai/models/assemblyai/universal-3-pro/
  15. Rate limits docs. https://www.assemblyai.com/docs/pre-recorded-audio/rate-limits
  16. FAQ: Are files submitted to the API used for model training. https://www.assemblyai.com/docs/faq/are-files-submitted-to-the-api-used-for-model-training
  17. Delete transcripts docs. https://www.assemblyai.com/docs/pre-recorded-audio/delete-transcripts
  18. AssemblyAI About page. https://www.assemblyai.com/about
  19. Series C announcement. https://www.assemblyai.com/blog/announcing-our-50m-series-c-to-build-superhuman-speech-ai-models
  20. Universal-3 Pro product page. https://www.assemblyai.com/universal-3-pro
  21. Optimizing accuracy and latency in streaming. https://www.assemblyai.com/docs/streaming/getting-started/optimizing-accuracy-and-latency
  22. Universal-3 Pro Streaming launch and pricing. https://www.assemblyai.com/blog/universal-3-pro-streaming
The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono