Microsoft Azure Whisper: model profile

Microsoft Azure Whisper is a set of Azure delivery paths for OpenAI's Whisper speech-to-text model, comprising a managed Azure OpenAI endpoint for file-based transcription and speech-to-English translation and Azure Speech batch transcription support for Whisper.

Specifications

Developer	Model created by OpenAI; Azure offering productized and hosted by Microsoft
Released	Whisper released by OpenAI September 21, 2022; Azure public preview September 15, 2023; general availability in Azure OpenAI and Azure Speech March 13, 2024
Model type	General-purpose speech recognition model for transcription and speech-to-English translation
Parameters	Not publicly disclosed for the managed Azure OpenAI whisper SKU. The separate catalog model openai-whisper-large-v3 is listed as 1.55B parameters
Training data	680,000 hours of multilingual and multitask supervised audio data from the web
Languages	Transcription in 57 languages; translation from those languages into English
Modes (batch / streaming)	File-based and batch. The core whisper lane is not for streaming; Microsoft directs low-latency streaming to GPT Realtime Whisper or Azure Speech real-time and fast transcription
Latency	Not publicly disclosed. Microsoft publishes no Whisper-specific latency SLA or p50/p95 latency table
Throughput / concurrency	Azure Speech Whisper batch transcription: up to 1,000 files per request. Not publicly disclosed for the Azure OpenAI endpoint
Deployment	Managed Azure OpenAI endpoint; Azure Speech batch transcription API; openai-whisper-large-v3 on managed compute in the Azure model catalog
Pricing	Consumption-based, billed by audio hours in one-second increments, region-sensitive. Microsoft moderators cited $0.36 per audio hour in the US region in late 2023

Not disclosedLicense

Known limitations

Hallucination risk. The Associated Press reported in 2024 that researchers and practitioners had seen Whisper invent words, passages, or medically consequential details that were never spoken. Because Azure's offering is built on the Whisper family, this applies to Azure deployments, particularly in healthcare, legal, and compliance workflows. The PennSound evaluation also flagged hallucination behavior as a practical deployment concern.
Domain sensitivity. Independent studies confirm Whisper as a strong multilingual baseline that often outperforms older cloud ASR systems on difficult data, but performance is not uniform across all tasks.
Confidence calibration. A comparative study found that Microsoft and Whisper tended to report lower confidence than measured accuracy, which affects downstream systems using confidence thresholds for routing or human review.
Microsoft does not disclose the exact underlying checkpoint or parameter count behind the managed Azure OpenAI whisper SKU. Public documentation exposes only the managed model ID whisper and version 001.
Microsoft does not publish a Whisper-specific public p50/p95 latency table or an Azure-hosted WER benchmark sheet. Accuracy and speed claims beyond product positioning depend on workload-specific testing.
Current exact Whisper pricing is region-sensitive and not consistently visible in the retrieved official HTML. The clearest public numeric reference is a Microsoft moderator citing $0.36 per audio hour in the US region in late 2023, which is not a guaranteed current price.
No customer-visible model-size selection (tiny/base/small/medium/large) on the managed Azure OpenAI whisper endpoint.
The retrieved sources contain no first-party documentation packaging the Whisper batch route as an on-premises or containerized Whisper product.

Full technical breakdown9 sections

Overview

Whisper on Azure is not a single product. It is a set of Azure delivery paths around OpenAI's Whisper speech model. The two primary Azure surfaces are a managed Azure OpenAI model endpoint named whisper for file-based speech-to-text and speech-to-English translation, and Azure Speech batch transcription support for Whisper, which adds enterprise transcription features such as larger files, asynchronous batch operation, speaker diarization, and word-level timestamps. Microsoft's newer audio roadmap layers successors around that base offer, notably GPT-4o transcription models and GPT Realtime Whisper for low-latency streaming transcription.

The core Azure OpenAI Whisper endpoint exposes one managed Azure model ID, whisper, through /audio/transcriptions and /audio/translations, with a documented maximum request size of 25 MB. Microsoft does not publicly expose model-size choices such as tiny, base, small, medium, or large on that managed endpoint. This differs from the open-source Whisper ecosystem, where OpenAI's repository exposes six model sizes, and from Azure's broader model catalog, which separately lists openai-whisper-large-v3 as a distinct managed-compute catalog model.

The underlying model was created by OpenAI. It was trained on 680,000 hours of multilingual and multitask supervised audio data from the web. Microsoft's role has been productization, hosting, governance, and service integration.

Capabilities and features

Azure exposes Whisper in three ways:

Surface	What Azure exposes	Model-size choice exposed to customer	Technical specialization	Key documented limits and caveats
Azure OpenAI Whisper	Model ID whisper, version 001, via /audio/transcriptions and /audio/translations	No public size choice on the managed endpoint	File-based STT and speech translation into English; managed Azure OpenAI experience	25 MB max audio request; prerecorded/file-based rather than streaming; Microsoft publicly documents one managed SKU rather than tiny/base/small/medium/large choices
Azure Speech with Whisper	Whisper selectable through Azure Speech batch transcription	No public size choice in the batch API docs	Large-volume prerecorded transcription with async processing, diarization, and timestamps	Up to 1 GB per file, up to 1,000 files per request; intended for batch, not live streaming
Azure model catalog openai-whisper-large-v3	Separate catalog model on managed compute	Yes, implicitly, because the catalog entry is specifically large-v3	For teams that want a specific open Whisper checkpoint on Azure infrastructure rather than the Azure OpenAI managed API SKU	Distinct from Azure OpenAI whisper; listed as 1.55B parameters and generally available in catalog

The Azure OpenAI REST reference defines two core operations: /audio/transcriptions, which transcribes into the input language, and /audio/translations, which transcribes and translates into English text. The REST API documents a language hint to improve accuracy and latency. The quickstart and samples show REST/cURL, Python, JavaScript/TypeScript via the OpenAI client library with AzureOpenAI, and PowerShell examples. The quickstart lists common input formats including mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Azure Speech's Whisper support is built around batch transcription. The preview and GA announcements state that Azure Speech adds asynchronous processing, speaker diarization, word-level timestamps, and larger file and batch capacity: files up to 1 GB and up to 1,000 files in a single batch request.

Microsoft's broader speech stack also includes fast transcription, which "returns results synchronously and faster than real-time," and LLM Speech, which supports multilingual transcription and translation, diarization, and prompt-tuning capabilities at up to 500 MB and under 5 hours per file.

Documented input-size limits across the Azure transcription paths: 25 MB for Azure OpenAI whisper, under 500 MB for Azure Speech LLM Speech, and up to 1 GB for Azure Speech Whisper batch transcription.

Microsoft's recommended use cases for Whisper specifically are small prerecorded audio files, translation of non-English speech into English, call-center post-call analysis, accessibility captions, and mining large audio and video corpora for searchable text and downstream insight extraction.

Language support

Microsoft's launch and GA materials describe Whisper on Azure as supporting transcription in 57 languages and translation from those languages into English. OpenAI described Whisper as handling multilingual transcription and translation into English, and positioned it as "robust" to accents, background noise, and technical language.

Performance and benchmarks

Microsoft does not publish a public Azure-hosted WER chart for the managed whisper endpoint.

Vendor-reported: at GA, Microsoft said "thousands of customers" had already used the Whisper API in Azure across healthcare, education, finance, manufacturing, media, and agriculture. Microsoft's featured customer at GA was Lightbulb.AI, which used Whisper in Azure OpenAI Service for call-center workflows and stated its product was "500X more scalable, 90X faster, and 20X more cost-effective than manual call reviews." Those are customer-reported figures rather than third-party audited benchmarks.

Third-party evaluation: in the PennSound evaluation, researchers tested AWS, Azure, Google, IBM, Rev.ai, Whisper, and Whisper.cpp on nearly 10 hours of varied archival audio. Rev.ai was the top performer overall and Whisper was the top open-source performer "as long as hallucinations were avoided," with relatively slim WER and diarization gaps across systems.

Third-party evaluation: a 2024 child-speech study compared Microsoft Azure Speech, Google Cloud Speech-to-Text, and multiple Whisper sizes. On that dataset, the best Whisper model achieved a WER of 21.3%, versus 30.3% for Azure and 49.0% for Google. The same study found that local Whisper variants could beat cloud systems on responsiveness in some setups, a result that partly reflects hardware choice and network overhead rather than model quality alone.

Third-party evaluation: a large comparative study on ASR solution accuracy found that Microsoft and Whisper tended to report lower confidence than measured accuracy, whereas some rivals appeared overconfident. The same study observed that higher accuracy did not necessarily correlate with shorter processing time.

Latency and throughput

Microsoft does not publish a Whisper-specific latency SLA or p50/p95 latency table. Microsoft's product positioning is: Azure OpenAI Whisper is recommended for smaller files and time-sensitive work; Azure Speech fast transcription "returns results synchronously and faster than real-time" for prerecorded audio; GPT Realtime Whisper is intended for low-latency streaming captions and monitoring.

Documented capacity figures: 25 MB maximum request size for the Azure OpenAI whisper endpoint; up to 1 GB per file and up to 1,000 files per batch request for Azure Speech Whisper batch transcription; up to 500 MB and under 5 hours per file for LLM Speech.

Deployment and integrations

Microsoft's architecture guidance separates the two main services by role. Azure OpenAI audio models are for scenarios that combine speech with language reasoning or flexible prompt-based control, or where the team is already in the Azure OpenAI stack. Azure Speech is for high-volume real-time or batch transcription, diarization, custom speech models, custom vocabulary, and deployments that may require container or sovereign-cloud options.

Azure Speech broadly supports containers and on-premises deployment options, but the retrieved first-party documentation does not package the Whisper batch route itself as an on-premises or containerized Whisper product.

Data handling for Azure OpenAI: Microsoft states that customer data, prompts, and completions are not available to other customers, not available to OpenAI or other model providers, not used to improve Microsoft or third-party products or services without explicit permission, and not used to train foundation models without customer permission or instruction. Prompts and responses are processed within the customer-specified geography unless the customer uses Global or DataZone deployment types. Stored data is encrypted at rest with AES-256 by default, with customer-managed keys available for supported scenarios.

Microsoft's Transparency Note states that Azure OpenAI wraps OpenAI models inside Microsoft-managed guardrails and abuse-detection systems. Microsoft announced a confidential inferencing preview for Azure OpenAI Whisper in September 2024, targeting end-to-end privacy in regulated industries.

Compliance: Azure inherits the broader Microsoft Azure compliance program rather than Whisper having a separately branded compliance regime. Microsoft says Azure has more than 100 compliance offerings and points to third-party reports and certifications such as ISO 27001, SOC 2, FedRAMP, and HITRUST across the Azure platform. Microsoft states that Azure supports customers subject to HIPAA and that Azure and Azure Government align with the NIST Cybersecurity Framework and are certified under ISO/IEC 27001. Teams with regulated deployments need to validate service-specific scope in the Service Trust Portal and their contractual setup.

Comparison across Azure offerings and external services, as stated in the source:

Offering	Core lane	Languages stated publicly	Streaming / real-time	Diarization	Custom adaptation or tuning	On-prem / container path	Best fit
Azure OpenAI Whisper	Managed file-based transcription and speech-to-English translation	57 at launch/GA messaging	Not for the core whisper lane; Microsoft now points low-latency streaming to GPT Realtime Whisper instead	Not documented for core whisper endpoint	No public customer-visible tuning flow documented for the core managed Azure OpenAI whisper SKU	No first-party Whisper container documented in retrieved sources	Small prerecorded multilingual files inside Azure OpenAI workflows
Azure Speech with Whisper	Batch transcription using Whisper	Whisper multilingual plus translation to English	Batch only	Yes	For domain tuning, Microsoft generally steers customers to Azure Speech custom speech features rather than the basic Whisper batch lane	Azure Speech as a service supports containers more broadly, but Whisper-specific container packaging is not documented in the retrieved sources	Large prerecorded files, batch jobs, call recordings, post-call analytics
Azure Speech native STT / fast / LLM Speech	Speech-specialized platform	Broad locale matrix	Yes	Yes	Yes	Yes	Deterministic enterprise STT, custom vocabulary/acoustics, regulated deployment constraints
Google Cloud Speech-to-Text / Chirp	Managed cloud STT	85+ languages and variants	Yes	Yes	Yes, model adaptation	Not documented in retrieved sources	Multilingual cloud STT with strong adaptation support
Amazon Transcribe	Managed cloud STT	Broad supported-language matrix	Yes	Yes	Yes, custom vocabulary and custom language models	Not documented in retrieved sources	AWS-native batch and streaming transcription, especially call analytics workflows

Pricing

Pricing is consumption-based. The official Azure Speech pricing page says speech-to-text hours are measured by hours of audio sent and billed in one-second increments, with separate SKUs for standard transcription, fast transcription, batch transcription, and LLM Speech.

Microsoft moderators publicly cited Azure OpenAI Whisper pricing at $0.36 per audio hour in the US region in late 2023. The primary Azure pricing pages are dynamically rendered and the retrieved HTML does not expose a stable current Whisper line item; the source states this figure should not be treated as a guaranteed current price and that pricing is region-sensitive.

Development and ownership

The underlying model was created by OpenAI, not Microsoft. OpenAI's Whisper paper is authored by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. The model was trained on 680,000 hours of multilingual and multitask supervised audio data from the web.

Microsoft's role was productization, hosting, governance, and service integration. Public Azure messaging places Whisper under the Azure AI / Azure OpenAI / Azure Speech umbrella. The public preview announcement was published by Microsoft's Heiko Rausch in the Azure AI services blog, and the GA announcement was published by Marco Casalaina, Vice President of Products for Azure AI. Microsoft does not publish a named Azure Whisper engineering team roster in the retrieved sources.

The Azure OpenAI managed whisper endpoint is not the same as the open-source openai/whisper repository, which exposes model sizes tiny, base, small, medium, large, and turbo, with explicit parameter counts and hardware tradeoffs. Azure also separately catalogs openai-whisper-large-v3 on managed compute.

Microsoft's stated motivation was enterprise-facing: its GA announcement says enterprises struggle to analyze voice interactions across many languages while preserving security and privacy guardrails, and it frames Whisper on Azure as giving customers a choice between Azure OpenAI and Azure Speech for call-center analytics, captions and accessibility, and mining audio and video for insights.

The source states there is no meaningful public evidence that Meta played a role in Azure's Whisper offer; Meta models such as MMS belong to a separate multilingual speech line.

Release history

Date	Milestone	What changed	Why Microsoft updated it
September 21, 2022	OpenAI releases Whisper	Whisper is launched as an ASR system trained on 680,000 hours of multilingual/multitask audio data	Established the upstream model that Azure later commercialized
September 15, 2023	Azure public preview	Microsoft announces Whisper preview in both Azure OpenAI Service and Azure AI Speech	To bring OpenAI Whisper into Azure with enterprise controls and two usage paths: simple API and batch speech pipeline
September 2023	Azure Speech REST API v3.2 preview	Azure Speech v3.2 preview adds the API path that supports Whisper for batch transcription	To enable Whisper inside the existing Speech batch ecosystem
March 13, 2024	General availability	Whisper becomes GA in Azure OpenAI and Azure Speech	To move Whisper into production workloads under Azure's enterprise-readiness positioning
June 2024	Azure Speech REST API v3.2 GA	Speech batch API version supporting Whisper goes GA	To stabilize the Azure Speech integration path for production batch transcription
September 24, 2024	Confidential Inferencing preview	Microsoft announces confidential inferencing preview for Azure OpenAI Whisper	To support verifiable end-to-end privacy for regulated workloads
April 2025	GPT-4o transcription models released on Azure	gpt-4o-transcribe and gpt-4o-mini-transcribe arrive in Azure OpenAI	To improve accuracy, flexibility, and voice-first application support beyond classic Whisper
May 2026	GPT Realtime Whisper documentation appears	Azure publishes GPT Realtime Whisper concept docs	To cover low-latency streaming transcription for captions, monitoring, and archival workflows, a gap in the original file-based Whisper API

The most recent Whisper-family update in the source is the May 2026 publication of GPT Realtime Whisper documentation, described by Microsoft as covering low-latency, stream-based transcription for live captions and monitoring.

Sources

Foundry Models sold by Azure - Microsoft Foundry | Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry/foundry-models/concepts/models-sold-directly-by-azure
Announcing the Preview of OpenAI Whisper in Azure OpenAI service and Azure AI Speech | Microsoft Community Hub. https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/announcing-the-preview-of-openai-whisper-in-azure-openai-service/ba-p/3928388
Introducing Whisper. https://openai.com/index/whisper/
Accelerate your productivity with the Whisper model in Azure AI now generally available | Microsoft Azure Blog. https://azure.microsoft.com/en-us/blog/accelerate-your-productivity-with-the-whisper-model-in-azure-ai-now-generally-available/
AI Model Catalog | Microsoft Foundry Models. https://ai.azure.com/catalog/models/whisper
AI Model Catalog | Microsoft Foundry Models. https://ai.azure.com/catalog/models/openai-whisper-large-v3
Choose an Azure Speech Recognition and Generation Technology - Azure Architecture Center | Microsoft Learn. https://learn.microsoft.com/en-us/azure/architecture/data-guide/ai-services/speech-recognition-generation
Pricing - Azure Speech in Foundry Tools | Microsoft Azure. https://azure.microsoft.com/ja-jp/pricing/details/speech/
What's new - Speech service - Foundry Tools | Microsoft Learn. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/releasenotes
Microsoft Trustworthy AI: Unlocking human potential starts with trust. https://blogs.microsoft.com/blog/2024/09/24/microsoft-trustworthy-ai-unlocking-human-potential-starts-with-trust/
What's new in Azure OpenAI in Microsoft Foundry Models? (classic) - Microsoft Foundry (classic) portal | Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry-classic/openai/whats-new
Robust Speech Recognition via Large-Scale Weak Supervision. https://arxiv.org/abs/2212.04356
GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision. https://github.com/openai/whisper
blog/open-asr-leaderboard.md at main - huggingface/blog - GitHub. https://github.com/huggingface/blog/blob/main/open-asr-leaderboard.md
Evaluating Speech-to-Text Systems with PennSound. https://arxiv.org/html/2504.05702v1
Child Speech Recognition in Human-Robot Interaction: Problem Solved? https://arxiv.org/html/2404.17394v2
Measuring the Accuracy of Automatic Speech Recognition Solutions. https://arxiv.org/html/2408.16287v1
Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said. https://apnews.com/article/90020cdf5fa16c79ca2e5b6c4c9bbb14
Data, privacy, and security for Foundry Models sold by Azure in Microsoft Foundry - Microsoft Foundry | Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry/responsible-ai/openai/data-privacy
Transparency Note for Azure OpenAI in Microsoft Foundry Models - Microsoft Foundry | Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry/responsible-ai/openai/transparency-note
Compliance in the trusted cloud | Microsoft Azure. https://azure.microsoft.com/en-us/explore/trusted-cloud/compliance
Speech-to-Text: AI voice typing and transcription. https://cloud.google.com/speech-to-text
What is Amazon Transcribe? - Amazon. https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html
Speech to text with Whisper - Microsoft Foundry | Microsoft Learn. https://learn.microsoft.com/en-us/azure/foundry/openai/whisper-quickstart
What is the price for whisper model and what is the quota for normal customer and commitment customer - Microsoft Q&A. https://learn.microsoft.com/en-us/answers/questions/1466656/what-is-the-price-for-whisper-model-and-what-is-th