Universal-3 Pro: what AssemblyAI shipped, and what it still won't say
AssemblyAI's Universal-3 Pro reviewed: promptable transcription, WER benchmarks, pricing, compliance caveats, and what the public record still hides.

AssemblyAI launched Universal-3 Pro on February 3, 2026, billing it as the first "production-quality" promptable speech language model. The pitch is not a new disclosed architecture. It is a new control surface: developers steer transcription up front with natural-language prompts, keyterms, audio tags, disfluency instructions, speaker-role cues, and code-switching hints, instead of cleaning up transcripts after the fact. AssemblyAI's public materials class it as a "SpeechLLM," a speech-augmented large language model. After digging through everything public on the model, my read is that the behavioral evidence is strong, the pricing is clear, and the technical disclosure is close to zero. Both halves matter if you are deciding whether to build on it.
The headline numbers come from AssemblyAI's own benchmark pages and docs. Universal-3 Pro posts a mean English WER of 5.6% against Universal-2's 6.1%, a FLEURS multilingual average WER of 4.58% against Universal-2's 7.42%, and materially lower missed-entity rates for medical terms, locations, email addresses, phone numbers, and credit-card numbers. On AssemblyAI's broader benchmark site it sits near the top on pre-recorded multilingual WER and leads the displayed code-switching and diarization comparisons. Universal-3 Pro Streaming leads the displayed streaming WER and medical missed-entity comparisons.
The model's practical strengths are controllability and entity accuracy. AssemblyAI claims up to a 45% accuracy improvement on domain-specific terms when prompting is used well, and the docs expose behavior control that is unusually rich for ASR: verbatim disfluencies, non-speech tagging, context carryover, and model routing across languages. The weaknesses are disclosure gaps and scope gaps. I found no public Universal-3 Pro model card, whitepaper, or technical note covering its architecture, parameter count, tokenizer, training corpus scale, or pretraining and fine-tuning recipe. I also found no public end-to-end speech-to-intent benchmark for it.
On the deployment side, AssemblyAI offers cloud APIs, EU data residency, and self-hosted streaming. The compliance posture is solid on paper: HIPAA BAA availability, DPA coverage, SOC 2 Type 2, ISO 27001:2022, PCI DSS v4.0, encryption, deletion APIs, and EU/UK/Swiss privacy mechanisms. There is one caveat that deserves to lead any procurement conversation: AssemblyAI says certain submitted files may be used for model training after PII redaction unless the customer is under a BAA, uses EU servers, or opts out. Configuration and contract selection are a first-order product decision here, not a procurement afterthought.
Where the model sits and what the evidence looks like
Universal-3 Pro is the center of AssemblyAI's current Voice AI stack. Product pages position it as the company's most powerful pre-recorded model and the foundation for downstream voice products. The streaming counterpart, Universal-3 Pro Streaming, extends the same promptable behavior into real time and now also runs under AssemblyAI's Voice Agent API.
These are the highest-value public sources, ranked by how much weight I'd put on them.
| Source type | What it contributes | Priority | Source |
|---|---|---|---|
| Official launch blog | Release rationale, feature framing, pricing claim, roadmap | Highest | "Introducing Universal-3 Pro" |
| Official async docs | API behavior, language routing, prompting rules, keyterm limits, audio-tag controls | Highest | Universal-3 Pro async docs |
| Official benchmark docs/site | WER, MER, multilingual, code-switching, diarization, streaming latency methodology | Highest | Benchmarks docs and benchmark site |
| Official pricing docs | Batch, streaming, add-ons, Voice Agent pricing | Highest | Pricing page |
| Official security/compliance docs | BAA, DPA, training/retention, EU residency, encryption | Highest | Trust/security pages |
| Official self-hosted docs | On-prem streaming architecture, hardware, isolation, concurrency | Highest | Self-hosted streaming docs and deployment page |
| AssemblyAI research paper | Closest available technical antecedent for the Universal family | High | Universal-1 research and paper |
| Third-party/open benchmark | Public reproducible streaming benchmark framework | Medium | Pipecat STT Benchmark repo |
| Third-party preprint | External stress test for broader AssemblyAI multilingual performance on Indian speech | Medium | Voice of India preprint snippet |
The evidence base is strong on product behavior, pricing, deployment, and benchmarked outcomes, and weak on model internals. Universal-1 and Universal-2-TF got dedicated research writeups. Universal-3 Pro appears in public only through product docs, product pages, and blogs.
What is publicly known about the tech
From the public interface, Universal-3 Pro is an instruction-conditioned speech model for pre-recorded audio. The async docs describe it as the company's most powerful Voice AI model, built to handle entities, rare words, and domain-specific terminology, with optional prompting and code-switching. Public materials also show support for audio event markers, speaker cues, verbatim and disfluency capture, and removal of inline audio tags from outputs.
The language story has two layers, and the distinction matters more than the marketing suggests. Native Universal-3 Pro coverage is six languages: English, Spanish, Portuguese, French, German, and Italian. To reach 99 languages in pre-recorded workflows, the docs recommend routing with speech_models: ["universal-3-pro", "universal-2"], which tries Universal-3 Pro first where supported and falls back automatically to Universal-2 everywhere else.
Prompting is not a side feature. It is the product. In async transcription, the model accepts either a free-form prompt or keyterms_prompt, but not both in the same request. Keyterms prompting allows up to 1,000 words or phrases, with a maximum of 6 words per phrase. AssemblyAI says testing showed up to a 45% accuracy gain on domain-specific terms when prompting is used effectively. The launch blog also says the model is trained on 50+ audio event tags and can be prompted for custom domain-specific tags.
Medical adaptation ships as an add-on rather than a per-customer fine-tune. Medical Mode (domain="medical-v1") is documented as a specialized enhancement for medications, procedures, conditions, and dosages, and it can be combined with diarization, keyterms, and PII redaction. The docs say Medical Mode supports English, Spanish, German, and French; on unsupported languages it is skipped and the user is not charged.
What AssemblyAI is not disclosing
The public record is materially incomplete on the points many technical buyers care about most. I found no public Universal-3 Pro parameter count, architecture diagram, tokenizer description, training corpus size, data-source composition, pretraining algorithm, optimizer setup, or fine-tuning recipe. There is no model card and no whitepaper. The sources are explicit about capabilities and silent about internals.
The gap stands out because AssemblyAI has published exactly these details before. Universal-1 was publicly described as a 600M-parameter Conformer RNN-T model, with a full-context Conformer encoder pretrained using BEST-RQ on 12.5 million hours of unlabeled multilingual audio, then fine-tuned jointly with an RNN-T decoder using 188k hours of supervised data and 1.6M hours of pseudo-labeled data across four languages. The Universal-1 paper also discloses a two-layer LSTM predictor, WordPiece tokenization, JAX training, and TPU-based large-scale training. Nothing public shows that Universal-3 Pro reuses that architecture. Assuming continuity would be speculation, and I am not going to make that assumption for you.
The most defensible reading: Universal-3 Pro is presented as a promptable SpeechLLM layer on top of AssemblyAI's ASR infrastructure, and the company has chosen to publish behavioral evidence instead of architectural evidence. You can verify what it does and how well it performs. You cannot currently verify how many parameters it has, how it was trained, or whether it uses a transducer, encoder-decoder, or hybrid stack under the hood.

The benchmark picture
On AssemblyAI's pre-recorded benchmark docs, Universal-3 Pro reports a mean English WER of 5.6% and median 4.9%, versus Universal-2 at mean 6.1% and median 6.5%. On FLEURS multilingual benchmarks, the average reported WER is 4.58% for Universal-3 Pro against 7.42% for Universal-2, with especially large relative gains in French, Italian, and German. On missed-entity benchmarks, Universal-3 Pro improves over Universal-2 on every category shown, including medical terms, locations, job titles, organization names, email addresses, phone numbers, and credit-card numbers.
On the broader benchmark site, Universal-3 Pro posts a global multilingual WER of 8.23%, essentially tied with Speechmatics Enhanced at 8.22% and ahead of OpenAI GPT-4o Transcribe at 9.52%, OpenAI Whisper-1 at 14.39%, and Deepgram Nova-3 at 15.71% on that suite. The same page shows Universal-3 Pro with the best displayed average code-switching WER at 8.63% and the best displayed diarization cpWER at 33.34%.
For streaming, AssemblyAI's benchmark page shows Universal-3 Pro Streaming with the lowest displayed average WER at 5.53% and one of the best semantic WER results, and it leads the displayed missed-entity and medical missed-entity comparisons. Its median TTCT on the Pipecat benchmark is 335 ms, which is materially slower than Deepgram Nova-3 at 247 ms and slower than several other real-time systems on that particular latency metric.
Where it is genuinely strong
The clearest strength is entity-rich, controllable transcription. The model is built around the hard stuff: names, numbers, emails, addresses, medical terms, rare words, and mixed-language speech. The UI and docs expose that directly rather than hiding it behind post-processing defaults. For voice-agent, contact-center, and clinical-documentation use cases, WER alone underestimates business risk, and this is the model designed for that reality.
Domain adaptation without custom training is the second strength. AssemblyAI's messaging is explicit that prompt engineering is meant to replace bespoke model retraining or heavy post-processing for many use cases. The 45% keyterm improvement claim, the ability to capture or suppress disfluencies, and the audio tagging all point the same way.
The third is deployment breadth for production buyers. The model is available through the standard API, EU endpoints, and self-hosted streaming, and the self-hosted documentation is concrete about hardware, startup times, architecture, and isolation behavior. Most vendors expose far less operational detail publicly.
Where it is weak
Technical opacity is the biggest problem. No U3-specific model card, no whitepaper, no parameter count, no training-data disclosure. For a model marketed as a new class of speech language model, that gap will bother anyone running technically rigorous procurement, and it should, especially in regulated or safety-sensitive deployments.
The second weakness is native language scope. Universal-3 Pro itself covers six languages natively. The 99-language story depends on routing to Universal-2 outside those six. That is a sensible product strategy, but buyers should not mentally equate "Universal-3 Pro" with full native 99-language capability.
Third, the control surface is fragmented between async and streaming. Async supports prompt or keyterms but not both in one request. Streaming supports prompt plus keyterms together and mid-stream updates, but the docs say streaming prompting does not control output formatting; it is tuned for context biasing and turn detection instead. Teams cannot assume functional parity across the batch and real-time products.
Fourth, the latency numbers do not line up cleanly across sources. AssemblyAI's docs say Universal-3 Pro Streaming is "sub-300ms" for time-to-complete transcript latency. An AssemblyAI tutorial cites roughly 150 ms P50 after VAD endpoint detection. Hamming-linked AssemblyAI tutorials cite 307 ms P50. The Pipecat benchmark page shows 335 ms median TTCT and 534 ms at P95. My reading is that these measure different pipeline boundaries, but that is an inference on my part. Test against your own telemetry definitions before committing.
Hallucination, bias, and outside evidence
AssemblyAI's benchmark docs call hallucinations a "critical concern" and claim a 30% hallucination reduction versus Whisper on the pre-recorded side, but the public U3 materials do not break down hallucination behavior with the same detail they give WER or MER. The streaming docs also warn that large or low-quality keyterm lists can induce overcorrections and hallucinations, which is worth remembering before you paste your entire product glossary into keyterms_prompt.
On fairness, I found no public demographic bias audit, accent-fairness report, or subgroup model card for Universal-3 Pro. The most relevant external cautionary evidence is the 2026 Voice of India preprint, which reports severe failures for "AssemblyAI Universal" on some Indian languages, including very large WER spikes driven by failed language detection and generative hallucinations. Since the async docs say unsupported languages fall back to Universal-2 rather than Universal-3 Pro itself, I read that paper as a warning about the broader multilingual stack, not a clean indictment of native U3 Pro on its six supported languages. Anyone depending on fallback-based coverage should still take it seriously.
For third-party validation, the most credible public framework is the open-source Pipecat STT Benchmark, which AssemblyAI itself cites and which Soniox describes as the single source of truth for the numbers on its benchmark page. There is also anecdotal multi-API comparison content on Reddit, but the surfaced snippets do not provide enough methodology detail to use for decision-critical evaluation.
Deployment, pricing, and the fine print
Here is where Universal-3 Pro can actually run today.
| Deployment path | Publicly documented status | Notes | Sources |
|---|---|---|---|
| Cloud API, pre-recorded | Yes | Universal-3 Pro async via standard API; native 6 languages, 99-language routing through Universal-2 fallback | |
| Cloud API, streaming | Yes | Universal-3 Pro Streaming (u3-rt-pro) for real-time voice workflows | |
| EU cloud region | Yes | EU endpoint available for data residency | |
| Self-hosted / on-prem | Yes, for streaming | Self-hosted Streaming supports Universal-3 Pro Streaming, with containerized deployment inside customer infrastructure | |
| Edge / on-device | No official on-device program found | A third-party Cloudflare AI catalog entry for assemblyai/universal-3-pro suggests partner-hosted availability, but there is no official AssemblyAI on-device or edge program |
Pricing and throughput
Cloud pricing is straightforward. Universal-3 Pro async is $0.21/hour. Universal-2 is $0.15/hour. Async add-ons: Keyterms Prompting at $0.05/hour, Prompting at $0.05/hour in beta, Speaker Diarization at $0.02/hour, and Medical Mode at $0.15/hour. Universal-3 Pro Streaming is $0.45/hour, and the Voice Agent API is $4.50/hour. AssemblyAI notes that standard usage requires no commitments.
The public throughput data is operational rather than model-theoretic. For pre-recorded jobs, free accounts get 5 parallel transcriptions and paid accounts start at 200+ parallel transcriptions, with custom higher limits available. For streaming, paid accounts start at 100+ new sessions per minute, with unlimited open-session scaling and an automatic 10% scale-up when usage hits 70% of the current limit. Each self-hosted instance supports up to 48 concurrent streams without runtime degradation, per AssemblyAI.
One self-hosted caveat: the public docs describe self-hosted streaming only, not a self-hosted async Universal-3 Pro product. Teams needing full on-prem batch ASR should confirm roadmap and contract scope directly with AssemblyAI rather than inferring feature parity from the streaming documentation.

Security, privacy, and compliance
AssemblyAI's compliance posture is mature by startup standards. The docs say it offers a HIPAA BAA, automatically incorporates a DPA into customer terms, supports EU data residency, and holds SOC 2 Type 2 and ISO 27001 certifications; more recent product and security materials also cite PCI DSS v4.0. The security overview says encryption applies in transit and at rest, and the data-retention page specifies AES-128/AES-256 at rest with TLS 1.2+ in transit.
The model-training caveat is the part I would put in bold in any internal memo, if I used bold in memos. AssemblyAI says only certain submitted files, as permitted by contract, are used for model training after a PII-redaction process. Files will not be used for training if the customer is under a BAA, uses EU servers, or has opted out. For streaming, customers who opt out of model training get zero data retention of audio and transcripts in the streaming production environment, apart from limited metadata for logging and billing.
Deletion and retention controls exist, and they reward careful reading. AssemblyAI provides an API to delete transcripts and says users can remove transcript data and mark it as deleted. Async retention is configurable through TTL and deletion requests, while streaming retention can be zero-retention under the right configuration.
The regulatory takeaways are practical. For HIPAA, sign a BAA before running real PHI workloads. For GDPR, AssemblyAI points to EU endpoints, DPA coverage, and the privacy rights in its policy, and the privacy policy references EU-U.S., UK-U.S., and Swiss-U.S. Data Privacy Framework commitments. For highly regulated use, self-hosted streaming materially reduces exposure: audio, transcripts, and PII stay inside the customer environment, with only license validation and usage metadata leaving the network.
The ethical risks are not unique to AssemblyAI, but Universal-3 Pro's capabilities make them operationally immediate: voice surveillance, speaker attribution in sensitive conversations, automated processing of medical or financial identifiers, and prompt-driven capture of disfluencies or audio events that users may not realize are being stored. A model that preserves more of the texture of a conversation raises product value and privacy sensitivity at the same time. AssemblyAI's own training and retention disclosures are a good argument for designing governance, consent, and least-retention configuration into the rollout from day one.
Why AssemblyAI built it, and who is behind it
AssemblyAI's own explanation is direct: customers were building voice-heavy products in healthcare, customer calls, notetaking, and especially voice agents, and they needed control over transcription at the moment audio is processed, not only after the transcript is already wrong. The launch blog repeatedly frames Universal-3 Pro as a response to "signal locked in every voice conversation" and as a way to replace tool fragmentation and post-processing with "one model, one integration."
The launch also fits the company's longer arc. AssemblyAI's About page says its vision is to build "superhuman" Voice AI models, and its December 2023 Series C announcement said the company was raising capital specifically to build those models. Through 2025 AssemblyAI publicly emphasized enterprise security, EU data residency, and global deployment readiness, which makes Universal-3 Pro look like a model launch and a platform maturation step at once.
On people, the public record is stronger on the company than on a U3-specific research team. AssemblyAI is a research-oriented organization led by founder and CEO Dylan Fox. The launch and enablement materials for Universal-3 Pro and its streaming and usage guides were authored by Madison Bernstein, Ryan Seams, Martin Schweiger, and Kelsey Foster. The closest published technical research team for the modern Universal family is the Universal-1 paper by Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Efty, Daniel McCrystal, Sam Flamini, Domenic Donato, and Takuya Yoshioka. I cannot verify from public sources which of those researchers directly led Universal-3 Pro.
The highest-confidence timeline runs like this: the December 2023 Series C funded the model-building push, Universal-3 Pro launched on February 3, 2026, and the launch post promised more languages, better instruction following, and real-time support "over the coming weeks." That promise was partly realized through the March 2026 streaming release and the later medical and self-hosted expansion. These dates come from AssemblyAI's official blog, research, pricing, changelog, and security materials.
How it compares
A note on the OpenAI rows: the public sources I reviewed benchmarked Whisper-1 and GPT-4o Transcribe, so those are the models in the tables. I did not rely on an unofficial "ChatSpeech" label because it did not appear in the benchmark sources I reviewed.
The AssemblyAI lineage
| Model | Public technical disclosure | Language story | Prompting/control | Pricing | Notable public benchmark/result | Sources |
|---|---|---|---|---|---|---|
| Universal-1 | Strong: 600M Conformer RNN-T, BEST-RQ, 12.5M unlabeled + 188k supervised + 1.6M pseudo-labeled hours disclosed | 4 core languages in research release | No promptable control exposed like U3 | Historical, not a current list price in reviewed sources | 30% lower hallucination on speech vs Whisper large-v3; 5x faster than optimized Whisper baseline | |
| Universal-2 | Limited public internals in reviewed sources; positioned as building on Universal-1 | 99 languages | No natural-language prompting | $0.15/hr | Baseline/fallback model; weaker than U3 on WER and MER in current docs | |
| Universal-3 Pro | No public model card, whitepaper, or parameter count found | 6 native languages, 99 via U2 fallback | Async prompt or keyterms; audio tags; disfluency control; code switching | $0.21/hr | English mean WER 5.6%; FLEURS avg 4.58%; broad MER gains over U2 | |
| Universal-3 Pro Streaming | Public product/ops docs, but not full internals | 6 out of the box; multilingual mode with more coming | Streaming prompt + keyterms together; mid-stream updates; context carryover | $0.45/hr | Streaming WER 5.53%; medical MER 10.46%; median TTCT 335 ms in Pipecat benchmark |
Competitor snapshot
The table below mixes pre-recorded multilingual and streaming metrics from the reviewed benchmark sources. Read columns within a metric, not across them, because they come from different evaluation suites. AssemblyAI's benchmark site says all providers were tested through production APIs with identical audio and default settings.
| Provider / model | Pre-recorded multilingual WER | Streaming medical missed entity rate | Streaming median TTCT | Main read | Sources |
|---|---|---|---|---|---|
| AssemblyAI Universal-3 Pro / U3 Pro Streaming | 8.23% | 10.46% | 335 ms | Strongest overall balance in the reviewed sources, especially on entities and code-switching; latency is good but not fastest on Pipecat | |
| Deepgram Nova-3 | 15.71% | 14.33% | 247 ms | Faster on TTCT in Pipecat; weaker on multilingual WER and medical MER in reviewed sources | |
| OpenAI GPT-4o Transcribe | 9.52% | 25.72% | 637 ms | Competitive multilingual batch result; materially weaker streaming entity and TTCT numbers in reviewed suite | |
| OpenAI Whisper-1 | 14.39% | not shown | not shown | Valuable if self-hosting or open weights matter; materially behind U3 Pro on reviewed multilingual WER | |
| Speechmatics Enhanced | 8.22% | not shown | 495 ms | Essentially tied with U3 Pro on reviewed multilingual WER; slower TTCT in reviewed streaming comparison | |
| Microsoft Azure STT | not shown | 15.65% | 1016 ms | Better than GPT-4o on medical MER in reviewed streaming set, but much slower TTCT | |
| Google STT | not shown | not shown | 878 ms | Considerably slower TTCT than U3 Pro Streaming in reviewed Pipecat comparison |
The comparison boils down to one sentence: Universal-3 Pro looks strongest when the task emphasizes entity correctness, code-switching, diarization, or domain guidance, and less dominant when the primary objective is absolute minimum streaming turn-finalization latency. That is exactly what its product design suggests it should be.

How to adopt it without getting burned
Treat Universal-3 Pro as an accuracy-first, controllable STT layer, not a complete speech-to-intent system. Use it when downstream workflow quality depends on names, identifiers, medical terms, speaker structure, or mixed-language speech. If global language breadth comes first for you, explicitly test the fallback path to Universal-2 on the languages you care about. If you are building streaming voice agents, instrument at least three latency definitions (first partial, turn completion, and total response time), because marketing and benchmark sources measure different boundaries.
The implementation best practices from the docs and launch materials are simple. Start async transcription without a prompt, then add prompting only where error patterns justify it. Use keyterms_prompt when you know the vocabulary, and a natural-language prompt when you need behavioral guidance. For streaming, update prompt and keyterms mid-session as the dialog state changes. Do not stuff large lists of common words into keyterms; the docs warn this can cause overcorrections and hallucinations. For healthcare, combine Medical Mode with diarization and redaction if PHI is involved, and do not deploy without a BAA.
A minimal async Python example, adapted from AssemblyAI's docs:
import assemblyai as aai
aai.settings.api_key = "<YOUR_API_KEY>"
audio_file = "https://assembly.ai/sports_injuries.mp3"
config = aai.TranscriptionConfig(
speech_models=["universal-3-pro", "universal-2"], # fallback for unsupported languages
language_detection=True,
prompt=(
"Transcribe this as a medical consultation. "
"Prioritize medication names, dosages, and speaker-role clarity."
),
)
transcript = aai.Transcriber().transcribe(audio_file, config)
print(transcript.text)
This reflects the official async docs, including the recommended speech_models fallback strategy for 99-language coverage. In async mode, use either prompt or keyterms_prompt, never both in the same request.
A minimal streaming JavaScript example, adapted from the streaming docs:
const CONNECTION_PARAMS = {
sample_rate: 16000,
speech_model: "u3-rt-pro",
mode: "balanced",
prompt: "Customer support call about account verification and billing questions.",
keyterms_prompt: JSON.stringify(["AssemblyAI", "invoice number", "billing address"])
};
// Later, when the conversation moves to payment collection:
websocket.send(
JSON.stringify({
type: "UpdateConfiguration",
prompt: "Now collecting payment details.",
keyterms_prompt: ["credit card number", "expiration date", "postal code"]
})
);
This matches the official streaming guidance, which supports prompt plus keyterms together and UpdateConfiguration for mid-stream adaptation. Mid-stream reconfiguration is one of Universal-3 Pro Streaming's most distinctive operational advantages, and I have not seen it exposed this cleanly elsewhere.
What is still unanswered
Several questions stay open in the public record. There is no public Universal-3 Pro architecture note, model card, parameter count, training corpus disclosure, or U3-specific hallucination or bias audit in the sources I reviewed. There is no public end-to-end speech-to-intent benchmark, no public self-hosted async Universal-3 Pro documentation, and no official on-device or edge deployment program from AssemblyAI itself. For a highly regulated or highly specialized rollout, treat those as diligence items to resolve before signing, not as minor documentation gaps.
Sources
- Introducing Universal-3 Pro: A new class of speech language model optimized for Voice AI. https://www.assemblyai.com/blog/introducing-universal-3-pro
- AssemblyAI pre-recorded audio benchmarks. https://www.assemblyai.com/docs/pre-recorded-audio/benchmarks
- AssemblyAI FAQ: Can you sign a BAA. https://www.assemblyai.com/docs/faq/can-you-sign-a-baa
- Universal-3 Pro async docs. https://www.assemblyai.com/docs/pre-recorded-audio/universal-3-pro
- AssemblyAI pricing. https://www.assemblyai.com/pricing
- Self-hosted streaming docs. https://www.assemblyai.com/docs/streaming/self-hosted-streaming
- Universal-1 research. https://www.assemblyai.com/research/universal-1
- Pipecat STT Benchmark. https://github.com/pipecat-ai/stt-benchmark
- Voice of India preprint. https://arxiv.org/pdf/2604.19151
- Medical Mode docs. https://www.assemblyai.com/docs/pre-recorded-audio/medical-mode
- AssemblyAI benchmark site. https://www.assemblyai.com/benchmarks
- Expanding enterprise security and data residency capabilities. https://www.assemblyai.com/blog/expanding-enterprise-security-and-data-residency-capabilities
- Universal-3 Pro streaming docs. https://assemblyai.com/docs/streaming/universal-3-pro
- Cloudflare AI catalog entry for assemblyai/universal-3-pro. https://developers.cloudflare.com/ai/models/assemblyai/universal-3-pro/
- Rate limits docs. https://www.assemblyai.com/docs/pre-recorded-audio/rate-limits
- FAQ: Are files submitted to the API used for model training. https://www.assemblyai.com/docs/faq/are-files-submitted-to-the-api-used-for-model-training
- Delete transcripts docs. https://www.assemblyai.com/docs/pre-recorded-audio/delete-transcripts
- AssemblyAI About page. https://www.assemblyai.com/about
- Series C announcement. https://www.assemblyai.com/blog/announcing-our-50m-series-c-to-build-superhuman-speech-ai-models
- Universal-3 Pro product page. https://www.assemblyai.com/universal-3-pro
- Optimizing accuracy and latency in streaming. https://www.assemblyai.com/docs/streaming/getting-started/optimizing-accuracy-and-latency