Universal-3 Pro: what AssemblyAI shipped, and what it still won't say

AssemblyAI launched Universal-3 Pro on February 3, 2026, billing it as the first "production-quality" promptable speech language model. The pitch is not a new disclosed architecture. It is a new control surface: developers steer transcription up front with natural-language prompts, keyterms, audio tags, disfluency instructions, speaker-role cues, and code-switching hints, instead of cleaning up transcripts after the fact. AssemblyAI's public materials class it as a "SpeechLLM," a speech-augmented large language model. After digging through everything public on the model, my read is that the behavioral evidence is strong, the pricing is clear, and the technical disclosure is close to zero. Both halves matter if you are deciding whether to build on it.

The headline numbers come from AssemblyAI's own benchmark pages and docs. Universal-3 Pro posts a mean English WER of 5.6% against Universal-2's 6.1%, a FLEURS multilingual average WER of 4.58% against Universal-2's 7.42%, and materially lower missed-entity rates for medical terms, locations, email addresses, phone numbers, and credit-card numbers. On AssemblyAI's broader benchmark site it sits near the top on pre-recorded multilingual WER and leads the displayed code-switching and diarization comparisons. Universal-3 Pro Streaming leads the displayed streaming WER and medical missed-entity comparisons.

The model's practical strengths are controllability and entity accuracy. AssemblyAI claims up to a 45% accuracy improvement on domain-specific terms when prompting is used well, and the docs expose behavior control that is unusually rich for ASR: verbatim disfluencies, non-speech tagging, context carryover, and model routing across languages. The weaknesses are disclosure gaps and scope gaps. I found no public Universal-3 Pro model card, whitepaper, or technical note covering its architecture, parameter count, tokenizer, training corpus scale, or pretraining and fine-tuning recipe. I also found no public end-to-end speech-to-intent benchmark for it.

On the deployment side, AssemblyAI offers cloud APIs, EU data residency, and self-hosted streaming. The compliance posture is solid on paper: HIPAA BAA availability, DPA coverage, SOC 2 Type 2, ISO 27001:2022, PCI DSS v4.0, encryption, deletion APIs, and EU/UK/Swiss privacy mechanisms. There is one caveat that deserves to lead any procurement conversation: AssemblyAI says certain submitted files may be used for model training after PII redaction unless the customer is under a BAA, uses EU servers, or opts out. Configuration and contract selection are a first-order product decision here, not a procurement afterthought.

Where the model sits and what the evidence looks like

Universal-3 Pro is the center of AssemblyAI's current Voice AI stack. Product pages position it as the company's most powerful pre-recorded model and the foundation for downstream voice products. The streaming counterpart, Universal-3 Pro Streaming, extends the same promptable behavior into real time and now also runs under AssemblyAI's Voice Agent API.

These are the highest-value public sources, ranked by how much weight I'd put on them.

Source type	What it contributes	Priority	Source
Official launch blog	Release rationale, feature framing, pricing claim, roadmap	Highest	"Introducing Universal-3 Pro"
Official async docs	API behavior, language routing, prompting rules, keyterm limits, audio-tag controls	Highest	Universal-3 Pro async docs
Official benchmark docs/site	WER, MER, multilingual, code-switching, diarization, streaming latency methodology	Highest	Benchmarks docs and benchmark site
Official pricing docs	Batch, streaming, add-ons, Voice Agent pricing	Highest	Pricing page
Official security/compliance docs	BAA, DPA, training/retention, EU residency, encryption	Highest	Trust/security pages
Official self-hosted docs	On-prem streaming architecture, hardware, isolation, concurrency	Highest	Self-hosted streaming docs and deployment page
AssemblyAI research paper	Closest available technical antecedent for the Universal family	High	Universal-1 research and paper
Third-party/open benchmark	Public reproducible streaming benchmark framework	Medium	Pipecat STT Benchmark repo
Third-party preprint	External stress test for broader AssemblyAI multilingual performance on Indian speech	Medium	Voice of India preprint snippet

The evidence base is strong on product behavior, pricing, deployment, and benchmarked outcomes, and weak on model internals. Universal-1 and Universal-2-TF got dedicated research writeups. Universal-3 Pro appears in public only through product docs, product pages, and blogs.

What is publicly known about the tech

From the public interface, Universal-3 Pro is an instruction-conditioned speech model for pre-recorded audio. The async docs describe it as the company's most powerful Voice AI model, built to handle entities, rare words, and domain-specific terminology, with optional prompting and code-switching. Public materials also show support for audio event markers, speaker cues, verbatim and disfluency capture, and removal of inline audio tags from outputs.

The language story has two layers, and the distinction matters more than the marketing suggests. Native Universal-3 Pro coverage is six languages: English, Spanish, Portuguese, French, German, and Italian. To reach 99 languages in pre-recorded workflows, the docs recommend routing with speech_models: ["universal-3-pro", "universal-2"], which tries Universal-3 Pro first where supported and falls back automatically to Universal-2 everywhere else.

Prompting is not a side feature. It is the product. In async transcription, the model accepts either a free-form prompt or keyterms_prompt, but not both in the same request. Keyterms prompting allows up to 1,000 words or phrases, with a maximum of 6 words per phrase. AssemblyAI says testing showed up to a 45% accuracy gain on domain-specific terms when prompting is used effectively. The launch blog also says the model is trained on 50+ audio event tags and can be prompted for custom domain-specific tags.

Medical adaptation ships as an add-on rather than a per-customer fine-tune. Medical Mode (domain="medical-v1") is documented as a specialized enhancement for medications, procedures, conditions, and dosages, and it can be combined with diarization, keyterms, and PII redaction. The docs say Medical Mode supports English, Spanish, German, and French; on unsupported languages it is skipped and the user is not charged.

What AssemblyAI is not disclosing

The public record is materially incomplete on the points many technical buyers care about most. I found no public Universal-3 Pro parameter count, architecture diagram, tokenizer description, training corpus size, data-source composition, pretraining algorithm, optimizer setup, or fine-tuning recipe. There is no model card and no whitepaper. The sources are explicit about capabilities and silent about internals.

The gap stands out because AssemblyAI has published exactly these details before. Universal-1 was publicly described as a 600M-parameter Conformer RNN-T model, with a full-context Conformer encoder pretrained using BEST-RQ on 12.5 million hours of unlabeled multilingual audio, then fine-tuned jointly with an RNN-T decoder using 188k hours of supervised data and 1.6M hours of pseudo-labeled data across four languages. The Universal-1 paper also discloses a two-layer LSTM predictor, WordPiece tokenization, JAX training, and TPU-based large-scale training. Nothing public shows that Universal-3 Pro reuses that architecture. Assuming continuity would be speculation, and I am not going to make that assumption for you.

The most defensible reading: Universal-3 Pro is presented as a promptable SpeechLLM layer on top of AssemblyAI's ASR infrastructure, and the company has chosen to publish behavioral evidence instead of architectural evidence. You can verify what it does and how well it performs. You cannot currently verify how many parameters it has, how it was trained, or whether it uses a transducer, encoder-decoder, or hybrid stack under the hood.

Abstract illustration of amber entity nodes standing out crisply along a long slate waveform, with blurred sections resolving into sharp geometric marks

The benchmark picture

On AssemblyAI's pre-recorded benchmark docs, Universal-3 Pro reports a mean English WER of 5.6% and median 4.9%, versus Universal-2 at mean 6.1% and median 6.5%. On FLEURS multilingual benchmarks, the average reported WER is 4.58% for Universal-3 Pro against 7.42% for Universal-2, with especially large relative gains in French, Italian, and German. On missed-entity benchmarks, Universal-3 Pro improves over Universal-2 on every category shown, including medical terms, locations, job titles, organization names, email addresses, phone numbers, and credit-card numbers.

On the broader benchmark site, Universal-3 Pro posts a global multilingual WER of 8.23%, essentially tied with Speechmatics Enhanced at 8.22% and ahead of OpenAI GPT-4o Transcribe at 9.52%, OpenAI Whisper-1 at 14.39%, and Deepgram Nova-3 at 15.71% on that suite. The same page shows Universal-3 Pro with the best displayed average code-switching WER at 8.63% and the best displayed diarization cpWER at 33.34%.

For streaming, AssemblyAI's benchmark page shows Universal-3 Pro Streaming with the lowest displayed average WER at 5.53% and one of the best semantic WER results, and it leads the displayed missed-entity and medical missed-entity comparisons. Its median TTCT on the Pipecat benchmark is 335 ms, which is materially slower than Deepgram Nova-3 at 247 ms and slower than several other real-time systems on that particular latency metric.

Where it is genuinely strong

The clearest strength is entity-rich, controllable transcription. The model is built around the hard stuff: names, numbers, emails, addresses, medical terms, rare words, and mixed-language speech. The UI and docs expose that directly rather than hiding it behind post-processing defaults. For voice-agent, contact-center, and clinical-documentation use cases, WER alone underestimates business risk, and this is the model designed for that reality.

Domain adaptation without custom training is the second strength. AssemblyAI's messaging is explicit that prompt engineering is meant to replace bespoke model retraining or heavy post-processing for many use cases. The 45% keyterm improvement claim, the ability to capture or suppress disfluencies, and the audio tagging all point the same way.

The third is deployment breadth for production buyers. The model is available through the standard API, EU endpoints, and self-hosted streaming, and the self-hosted documentation is concrete about hardware, startup times, architecture, and isolation behavior. Most vendors expose far less operational detail publicly.

Where it is weak

Technical opacity is the biggest problem. No U3-specific model card, no whitepaper, no parameter count, no training-data disclosure. For a model marketed as a new class of speech language model, that gap will bother anyone running technically rigorous procurement, and it should, especially in regulated or safety-sensitive deployments.

The second weakness is native language scope. Universal-3 Pro itself covers six languages natively. The 99-language story depends on routing to Universal-2 outside those six. That is a sensible product strategy, but buyers should not mentally equate "Universal-3 Pro" with full native 99-language capability.

Third, the control surface is fragmented between async and streaming. Async supports prompt or keyterms but not both in one request. Streaming supports prompt plus keyterms together and mid-stream updates, but the docs say streaming prompting does not control output formatting; it is tuned for context biasing and turn detection instead. Teams cannot assume functional parity across the batch and real-time products.

Fourth, the latency numbers do not line up cleanly across sources. AssemblyAI's docs say Universal-3 Pro Streaming is "sub-300ms" for time-to-complete transcript latency. An AssemblyAI tutorial cites roughly 150 ms P50 after VAD endpoint detection. Hamming-linked AssemblyAI tutorials cite 307 ms P50. The Pipecat benchmark page shows 335 ms median TTCT and 534 ms at P95. My reading is that these measure different pipeline boundaries, but that is an inference on my part. Test against your own telemetry definitions before committing.

Hallucination, bias, and outside evidence

AssemblyAI's benchmark docs call hallucinations a "critical concern" and claim a 30% hallucination reduction versus Whisper on the pre-recorded side, but the public U3 materials do not break down hallucination behavior with the same detail they give WER or MER. The streaming docs also warn that large or low-quality keyterm lists can induce overcorrections and hallucinations, which is worth remembering before you paste your entire product glossary into keyterms_prompt.

On fairness, I found no public demographic bias audit, accent-fairness report, or subgroup model card for Universal-3 Pro. The most relevant external cautionary evidence is the 2026 Voice of India preprint, which reports severe failures for "AssemblyAI Universal" on some Indian languages, including very large WER spikes driven by failed language detection and generative hallucinations. Since the async docs say unsupported languages fall back to Universal-2 rather than Universal-3 Pro itself, I read that paper as a warning about the broader multilingual stack, not a clean indictment of native U3 Pro on its six supported languages. Anyone depending on fallback-based coverage should still take it seriously.

For third-party validation, the most credible public framework is the open-source Pipecat STT Benchmark, which AssemblyAI itself cites and which Soniox describes as the single source of truth for the numbers on its benchmark page. There is also anecdotal multi-API comparison content on Reddit, but the surfaced snippets do not provide enough methodology detail to use for decision-critical evaluation.

Deployment, pricing, and the fine print

Here is where Universal-3 Pro can actually run today.

Deployment path	Publicly documented status	Notes
Cloud API, pre-recorded	Yes	Universal-3 Pro async via standard API; native 6 languages, 99-language routing through Universal-2 fallback
Cloud API, streaming	Yes	Universal-3 Pro Streaming (u3-rt-pro) for real-time voice workflows
EU cloud region	Yes	EU endpoint available for data residency
Self-hosted / on-prem	Yes, for streaming	Self-hosted Streaming supports Universal-3 Pro Streaming, with containerized deployment inside customer infrastructure
Edge / on-device	No official on-device program found	A third-party Cloudflare AI catalog entry for assemblyai/universal-3-pro suggests partner-hosted availability, but there is no official AssemblyAI on-device or edge program

Pricing and throughput

Cloud pricing is straightforward. Universal-3 Pro async is $0.21/hour. Universal-2 is $0.15/hour. Async add-ons: Keyterms Prompting at $0.05/hour, Prompting at $0.05/hour in beta, Speaker Diarization at $0.02/hour, and Medical Mode at $0.15/hour. Universal-3 Pro Streaming is $0.45/hour, and the Voice Agent API is $4.50/hour. AssemblyAI notes that standard usage requires no commitments.

The public throughput data is operational rather than model-theoretic. For pre-recorded jobs, free accounts get 5 parallel transcriptions and paid accounts start at 200+ parallel transcriptions, with custom higher limits available. For streaming, paid accounts start at 100+ new sessions per minute, with unlimited open-session scaling and an automatic 10% scale-up when usage hits 70% of the current limit. Each self-hosted instance supports up to 48 concurrent streams without runtime degradation, per AssemblyAI.

One self-hosted caveat: the public docs describe self-hosted streaming only, not a self-hosted async Universal-3 Pro product. Teams needing full on-prem batch ASR should confirm roadmap and contract scope directly with AssemblyAI rather than inferring feature parity from the streaming documentation.

Abstract illustration of a sealed geometric container holding a waveform, with a single thin amber trace exiting toward a distant lattice, on a slate-teal field

Security, privacy, and compliance

AssemblyAI's compliance posture is mature by startup standards. The docs say it offers a HIPAA BAA, automatically incorporates a DPA into customer terms, supports EU data residency, and holds SOC 2 Type 2 and ISO 27001 certifications; more recent product and security materials also cite PCI DSS v4.0. The security overview says encryption applies in transit and at rest, and the data-retention page specifies AES-128/AES-256 at rest with TLS 1.2+ in transit.

The model-training caveat is the part I would put in bold in any internal memo, if I used bold in memos. AssemblyAI says only certain submitted files, as permitted by contract, are used for model training after a PII-redaction process. Files will not be used for training if the customer is under a BAA, uses EU servers, or has opted out. For streaming, customers who opt out of model training get zero data retention of audio and transcripts in the streaming production environment, apart from limited metadata for logging and billing.

Deletion and retention controls exist, and they reward careful reading. AssemblyAI provides an API to delete transcripts and says users can remove transcript data and mark it as deleted. Async retention is configurable through TTL and deletion requests, while streaming retention can be zero-retention under the right configuration.

The regulatory takeaways are practical. For HIPAA, sign a BAA before running real PHI workloads. For GDPR, AssemblyAI points to EU endpoints, DPA coverage, and the privacy rights in its policy, and the privacy policy references EU-U.S., UK-U.S., and Swiss-U.S. Data Privacy Framework commitments. For highly regulated use, self-hosted streaming materially reduces exposure: audio, transcripts, and PII stay inside the customer environment, with only license validation and usage metadata leaving the network.

The ethical risks are not unique to AssemblyAI, but Universal-3 Pro's capabilities make them operationally immediate: voice surveillance, speaker attribution in sensitive conversations, automated processing of medical or financial identifiers, and prompt-driven capture of disfluencies or audio events that users may not realize are being stored. A model that preserves more of the texture of a conversation raises product value and privacy sensitivity at the same time. AssemblyAI's own training and retention disclosures are a good argument for designing governance, consent, and least-retention configuration into the rollout from day one.

Why AssemblyAI built it, and who is behind it

AssemblyAI's own explanation is direct: customers were building voice-heavy products in healthcare, customer calls, notetaking, and especially voice agents, and they needed control over transcription at the moment audio is processed, not only after the transcript is already wrong. The launch blog repeatedly frames Universal-3 Pro as a response to "signal locked in every voice conversation" and as a way to replace tool fragmentation and post-processing with "one model, one integration."

The launch also fits the company's longer arc. AssemblyAI's About page says its vision is to build "superhuman" Voice AI models, and its December 2023 Series C announcement said the company was raising capital specifically to build those models. Through 2025 AssemblyAI publicly emphasized enterprise security, EU data residency, and global deployment readiness, which makes Universal-3 Pro look like a model launch and a platform maturation step at once.

On people, the public record is stronger on the company than on a U3-specific research team. AssemblyAI is a research-oriented organization led by founder and CEO Dylan Fox. The launch and enablement materials for Universal-3 Pro and its streaming and usage guides were authored by Madison Bernstein, Ryan Seams, Martin Schweiger, and Kelsey Foster. The closest published technical research team for the modern Universal family is the Universal-1 paper by Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Efty, Daniel McCrystal, Sam Flamini, Domenic Donato, and Takuya Yoshioka. I cannot verify from public sources which of those researchers directly led Universal-3 Pro.

The highest-confidence timeline runs like this: the December 2023 Series C funded the model-building push, Universal-3 Pro launched on February 3, 2026, and the launch post promised more languages, better instruction following, and real-time support "over the coming weeks." That promise was partly realized through the March 2026 streaming release and the later medical and self-hosted expansion. These dates come from AssemblyAI's official blog, research, pricing, changelog, and security materials.

How it compares

A note on the OpenAI rows: the public sources I reviewed benchmarked Whisper-1 and GPT-4o Transcribe, so those are the models in the tables. I did not rely on an unofficial "ChatSpeech" label because it did not appear in the benchmark sources I reviewed.

The AssemblyAI lineage

Model	Public technical disclosure	Language story	Prompting/control	Pricing	Notable public benchmark/result
Universal-1	Strong: 600M Conformer RNN-T, BEST-RQ, 12.5M unlabeled + 188k supervised + 1.6M pseudo-labeled hours disclosed	4 core languages in research release	No promptable control exposed like U3	Historical, not a current list price in reviewed sources	30% lower hallucination on speech vs Whisper large-v3; 5x faster than optimized Whisper baseline
Universal-2	Limited public internals in reviewed sources; positioned as building on Universal-1	99 languages	No natural-language prompting	$0.15/hr	Baseline/fallback model; weaker than U3 on WER and MER in current docs
Universal-3 Pro	No public model card, whitepaper, or parameter count found	6 native languages, 99 via U2 fallback	Async prompt or keyterms; audio tags; disfluency control; code switching	$0.21/hr	English mean WER 5.6%; FLEURS avg 4.58%; broad MER gains over U2
Universal-3 Pro Streaming	Public product/ops docs, but not full internals	6 out of the box; multilingual mode with more coming	Streaming prompt + keyterms together; mid-stream updates; context carryover	$0.45/hr	Streaming WER 5.53%; medical MER 10.46%; median TTCT 335 ms in Pipecat benchmark

Competitor snapshot

The table below mixes pre-recorded multilingual and streaming metrics from the reviewed benchmark sources. Read columns within a metric, not across them, because they come from different evaluation suites. AssemblyAI's benchmark site says all providers were tested through production APIs with identical audio and default settings.

Provider / model	Pre-recorded multilingual WER	Streaming medical missed entity rate	Streaming median TTCT	Main read
AssemblyAI Universal-3 Pro / U3 Pro Streaming	8.23%	10.46%	335 ms	Strongest overall balance in the reviewed sources, especially on entities and code-switching; latency is good but not fastest on Pipecat
Deepgram Nova-3	15.71%	14.33%	247 ms	Faster on TTCT in Pipecat; weaker on multilingual WER and medical MER in reviewed sources
OpenAI GPT-4o Transcribe	9.52%	25.72%	637 ms	Competitive multilingual batch result; materially weaker streaming entity and TTCT numbers in reviewed suite
OpenAI Whisper-1	14.39%	not shown	not shown	Valuable if self-hosting or open weights matter; materially behind U3 Pro on reviewed multilingual WER
Speechmatics Enhanced	8.22%	not shown	495 ms	Essentially tied with U3 Pro on reviewed multilingual WER; slower TTCT in reviewed streaming comparison
Microsoft Azure STT	not shown	15.65%	1016 ms	Better than GPT-4o on medical MER in reviewed streaming set, but much slower TTCT
Google STT	not shown	not shown	878 ms	Considerably slower TTCT than U3 Pro Streaming in reviewed Pipecat comparison

The comparison boils down to one sentence: Universal-3 Pro looks strongest when the task emphasizes entity correctness, code-switching, diarization, or domain guidance, and less dominant when the primary objective is absolute minimum streaming turn-finalization latency. That is exactly what its product design suggests it should be.

Abstract illustration of two interleaved waveform streams in sage and amber merging into a single ordered signal path across a slate-teal background

How to adopt it without getting burned

Treat Universal-3 Pro as an accuracy-first, controllable STT layer, not a complete speech-to-intent system. Use it when downstream workflow quality depends on names, identifiers, medical terms, speaker structure, or mixed-language speech. If global language breadth comes first for you, explicitly test the fallback path to Universal-2 on the languages you care about. If you are building streaming voice agents, instrument at least three latency definitions (first partial, turn completion, and total response time), because marketing and benchmark sources measure different boundaries.

The implementation best practices from the docs and launch materials are simple. Start async transcription without a prompt, then add prompting only where error patterns justify it. Use keyterms_prompt when you know the vocabulary, and a natural-language prompt when you need behavioral guidance. For streaming, update prompt and keyterms mid-session as the dialog state changes. Do not stuff large lists of common words into keyterms; the docs warn this can cause overcorrections and hallucinations. For healthcare, combine Medical Mode with diarization and redaction if PHI is involved, and do not deploy without a BAA.

A minimal async Python example, adapted from AssemblyAI's docs:

import assemblyai as aai

aai.settings.api_key = "<YOUR_API_KEY>"

audio_file = "https://assembly.ai/sports_injuries.mp3"

config = aai.TranscriptionConfig(
    speech_models=["universal-3-pro", "universal-2"],  # fallback for unsupported languages
    language_detection=True,
    prompt=(
        "Transcribe this as a medical consultation. "
        "Prioritize medication names, dosages, and speaker-role clarity."
    ),
)

transcript = aai.Transcriber().transcribe(audio_file, config)
print(transcript.text)

This reflects the official async docs, including the recommended speech_models fallback strategy for 99-language coverage. In async mode, use either prompt or keyterms_prompt, never both in the same request.

A minimal streaming JavaScript example, adapted from the streaming docs:

const CONNECTION_PARAMS = {
  sample_rate: 16000,
  speech_model: "u3-rt-pro",
  mode: "balanced",
  prompt: "Customer support call about account verification and billing questions.",
  keyterms_prompt: JSON.stringify(["AssemblyAI", "invoice number", "billing address"])
};

// Later, when the conversation moves to payment collection:
websocket.send(
  JSON.stringify({
    type: "UpdateConfiguration",
    prompt: "Now collecting payment details.",
    keyterms_prompt: ["credit card number", "expiration date", "postal code"]
  })
);

This matches the official streaming guidance, which supports prompt plus keyterms together and UpdateConfiguration for mid-stream adaptation. Mid-stream reconfiguration is one of Universal-3 Pro Streaming's most distinctive operational advantages, and I have not seen it exposed this cleanly elsewhere.

What is still unanswered

Several questions stay open in the public record. There is no public Universal-3 Pro architecture note, model card, parameter count, training corpus disclosure, or U3-specific hallucination or bias audit in the sources I reviewed. There is no public end-to-end speech-to-intent benchmark, no public self-hosted async Universal-3 Pro documentation, and no official on-device or edge deployment program from AssemblyAI itself. For a highly regulated or highly specialized rollout, treat those as diligence items to resolve before signing, not as minor documentation gaps.

Sources

Introducing Universal-3 Pro: A new class of speech language model optimized for Voice AI. https://www.assemblyai.com/blog/introducing-universal-3-pro
AssemblyAI pre-recorded audio benchmarks. https://www.assemblyai.com/docs/pre-recorded-audio/benchmarks
AssemblyAI FAQ: Can you sign a BAA. https://www.assemblyai.com/docs/faq/can-you-sign-a-baa
Universal-3 Pro async docs. https://www.assemblyai.com/docs/pre-recorded-audio/universal-3-pro
AssemblyAI pricing. https://www.assemblyai.com/pricing
Self-hosted streaming docs. https://www.assemblyai.com/docs/streaming/self-hosted-streaming
Universal-1 research. https://www.assemblyai.com/research/universal-1
Pipecat STT Benchmark. https://github.com/pipecat-ai/stt-benchmark
Voice of India preprint. https://arxiv.org/pdf/2604.19151
Medical Mode docs. https://www.assemblyai.com/docs/pre-recorded-audio/medical-mode
AssemblyAI benchmark site. https://www.assemblyai.com/benchmarks
Expanding enterprise security and data residency capabilities. https://www.assemblyai.com/blog/expanding-enterprise-security-and-data-residency-capabilities
Universal-3 Pro streaming docs. https://assemblyai.com/docs/streaming/universal-3-pro
Cloudflare AI catalog entry for assemblyai/universal-3-pro. https://developers.cloudflare.com/ai/models/assemblyai/universal-3-pro/
Rate limits docs. https://www.assemblyai.com/docs/pre-recorded-audio/rate-limits
FAQ: Are files submitted to the API used for model training. https://www.assemblyai.com/docs/faq/are-files-submitted-to-the-api-used-for-model-training
Delete transcripts docs. https://www.assemblyai.com/docs/pre-recorded-audio/delete-transcripts
AssemblyAI About page. https://www.assemblyai.com/about
Series C announcement. https://www.assemblyai.com/blog/announcing-our-50m-series-c-to-build-superhuman-speech-ai-models
Universal-3 Pro product page. https://www.assemblyai.com/universal-3-pro
Optimizing accuracy and latency in streaming. https://www.assemblyai.com/docs/streaming/getting-started/optimizing-accuracy-and-latency