Chirp 3: model profile
Reference profile of Google Cloud Chirp 3, a managed speech model family covering multilingual transcription, HD text-to-speech, and instant custom voice.
Chirp 3 is Google Cloud's speech model family, delivered as managed cloud services for speech-to-text (Chirp 3: Transcription), high-fidelity text-to-speech (Chirp 3: HD voices), and voice cloning (Chirp 3: Instant Custom Voice).
Specifications
| Developer | Google (Google Cloud) |
| Released | Phased 2025 rollout. TTS (HD voices) GA April 2, 2025; Instant Custom Voice GA (allowlist) April 9, 2025; STT private preview April 11, 2025, public preview August 29, 2025, GA October 13, 2025. |
| Model type | Managed cloud speech model family: multilingual automatic speech recognition (model ID chirp_3), high-fidelity TTS, and instant custom voice. |
| Parameters | Not publicly disclosed. The 2023 predecessor Chirp was described by Google as a 2B-parameter speech model; no equivalent disclosure exists for Chirp 3. |
| Languages | STT: 29 GA transcription locales, many additional preview locales, 85+ languages/locales in preview coverage, 14 diarization locales. TTS: 30 named voices across 53 supported languages/locales. |
| Modes (batch / streaming) | STT: StreamingRecognize, Recognize, and BatchRecognize. TTS: streaming and batch output; text streaming is exclusive to Chirp 3 HD voices within Cloud TTS. |
| Throughput / concurrency | Not publicly disclosed. Instant Custom Voice permits 10 new voice cloning keys per minute per project, with no stated absolute limit on total keys. |
| Deployment | Google Cloud API service: Cloud Text-to-Speech API, Speech-to-Text V2 API, Vertex AI Studio, and Google's console and notebook ecosystem. No boxed or downloadable product. |
| Pricing | STT standard recognition $0.016/min up to 500k minutes, lower volume-tier rates beyond, dynamic batch $0.003/min. TTS $30 per 1M characters after a free tier of 1M characters monthly. Instant Custom Voice pricing not fully surfaced publicly. |
| License | Proprietary managed cloud service; consumed via API, not distributed as software. |
Not disclosedTraining data · Latency
Full technical breakdown9 sections
Overview
Google uses the name "Chirp 3" in official documentation for three related cloud services: Chirp 3: HD voices for text-to-speech, Chirp 3: Instant Custom Voice for voice cloning, and Chirp 3: Transcription for speech-to-text. The family launched in phases during 2025 rather than on a single date: Chirp 3: HD voices reached general availability on April 2, 2025, Instant Custom Voice was announced as GA through an allowlist on April 9, 2025, and Chirp 3: Transcription entered private preview on April 11, 2025, public preview on August 29, 2025, and GA on October 13, 2025.
Chirp 3 is a cloud-managed speech stack, not a firmware-based device. There is no public firmware image or consumer software versioning; capability changes are surfaced through release notes, region rollouts, model identifiers, and documentation updates.
The name "Chirp 3" is ambiguous with a hardware product. The source resolves the ambiguity as follows.
| Possible match | Why it fits | Why it is less likely here | Source |
|---|---|---|---|
| Google Cloud Chirp 3 | Exact product name appears in official Google Cloud docs for STT and TTS. Google uses "Chirp 3" as a named speech model family. | None of significance; this is the best exact-name match. | |
| Deeper Smart Sonar CHIRP+ 3 | Prominent commercial hardware product; web search often surfaces it when users mean a device. | Official product name is CHIRP+ 3, not plain "Chirp 3," and its category is castable fish-finder sonar, not software or speech AI. |
Capabilities and features
The family comprises three variants.
| Variant | Official identity | Model ID or naming | Core function | Current technical snapshot | Source |
|---|---|---|---|---|---|
| Chirp 3: HD voices | Google Cloud Text-to-Speech | Voice names such as en-US-Chirp3-HD-Charon | High-fidelity TTS for real-time and batch synthesis | Current dedicated docs list 30 named voices, 53 supported languages/locales, GA endpoints in global, us, eu, asia-southeast1, europe-west2, asia-northeast1, streaming and batch output formats, and text streaming support. Launch GA milestone was 8 speakers / 31 locales on April 2, 2025. | |
| Chirp 3: Instant Custom Voice | Google Cloud Text-to-Speech | Voice cloning key generated per project/request | Fast voice cloning / custom branded or personal voices | Restricted to allowlisted users; supports streaming and batch synthesis, supports LINEAR16, PCM, MP3, M4A input encodings, pace control from 0.25x to 2x, experimental pause tags and custom pronunciations, and multilingual transfer from en-US to six listed locales. | |
| Chirp 3: Transcription | Google Cloud Speech-to-Text V2 | chirp_3 | Multilingual automatic speech recognition | Available only in Speech-to-Text V2; supports StreamingRecognize, Recognize, and BatchRecognize; documentation lists 29 GA transcription locales, many additional Preview locales, 14 diarization locales, built-in denoiser, language-agnostic transcription, and speech adaptation. |
For STT, Chirp 3 supports speaker diarization, automatic punctuation, automatic capitalization, speech adaptation with up to 1,000 phrases, a custom prompt feature in Preview, and a built-in denoiser that can reduce music, rain, and street noise but not background human voices.
For TTS, text streaming is exclusive to Chirp 3 HD voices in Google's Cloud TTS stack. Limited SSML support was added on October 17, 2025, covering the phoneme, p, s, sub, and say-as tags.
For custom voice, Google requires a spoken consent statement, recommends clean 10-second recordings, stores the resulting voice-cloning key client-side, and permits 10 new keys per minute per project with no stated absolute limit on total keys.
Language support
- Chirp 3: Transcription documentation lists 29 GA transcription locales, many additional Preview locales, and 14 diarization locales, plus language-agnostic transcription.
- At public preview (August 29, 2025), chirp_3 was announced with 85+ languages/locales in preview coverage.
- Chirp 3: HD voices documentation lists 30 named voices and 53 supported languages/locales.
- Instant Custom Voice added ja-JP on June 18, 2025, pushing support to more than 30 locales, and supports multilingual transfer from en-US to six listed locales.
- In November and December 2025, STT preview regions expanded and TTS added a wide set of European languages, then Punjabi and Cantonese in preview.
Performance and benchmarks
Vendor-reported: Google's 2023 launch materials for the original Chirp claimed 98% English recognition accuracy and 300% relative improvement in some low-resource languages. For Chirp 3, Google's launch materials emphasize feature breadth, speed improvements, diarization, language detection, and voice realism, and do not provide a public benchmark sheet of equal granularity.
Third-party evaluation: Artificial Analysis' streaming STT benchmark found that Chirp 3 Streaming led partial-transcript performance on VoxPopuli at 2.2% WER, while noting that no single model led across all tested datasets. Artificial Analysis' selected-voice TTS leaderboard snapshot placed Chirp 3: HD at Elo 1,056 and $30.0 per 1M characters, below Azure HD 2.5 at Elo 1,127 and Eleven v3 at Elo 1,179.
| Evidence area | What the evidence says | Interpretation |
|---|---|---|
| Independent STT benchmark | Artificial Analysis reported that in its streaming benchmark, Google's Chirp 3 Streaming led partial-transcript performance on VoxPopuli at 2.2% WER, while also noting that no single model leads everywhere. | Chirp 3 looks strong in real-time multilingual settings, but not categorically dominant across all datasets or latency conditions. |
| Independent TTS benchmark | Artificial Analysis' selected-voice leaderboard snapshot showed Chirp 3: HD at Elo 1,056 and $30.0 / 1M characters, below Azure HD 2.5 at Elo 1,127 and Eleven v3 at Elo 1,179. | Chirp 3 HD is competitive, but the benchmark snapshot does not place it at the very top of TTS naturalness. |
| Official real-world media use | Il Foglio said Chirp 3 HD offered the most natural Italian intonation among tested options, turned editorials into audio in minutes, and helped the paper reach the top three of its podcast offerings. | Evidence that Chirp 3 performs well in editorial long-form audio, especially when language-specific naturalness matters. |
| Official enterprise localization use | Adya reported localization across 20+ Indian languages with low latency using Chirp 3. | Suggests practical multilingual deployment, especially in enterprise localization. |
| Official contact-center use | HBX Group said Chirp 3 voices created a more natural, less robotic caller experience. | Supports Google's positioning in customer-experience voice channels. |
User sentiment reported by the source: G2 reviewers of Google Cloud Speech-to-Text praise ease of use, speed, and meeting-transcription productivity; recurring negatives include cost sensitivity and the need for manual correction when accuracy is not perfect. For Google Cloud Text-to-Speech, review summaries emphasize natural voice quality and simple API integration, while some users describe output as robotic in some scenarios or languages and some complain about pricing opacity or cost escalation. Google developer forum threads surfaced UI confusion, long-audio latency regressions, markup/SSML limitations, locale-specific pronunciation bugs, and allowlist/access friction.
The source compares Chirp 3 against alternative platforms as follows.
| Platform | STT | TTS | Custom voice / cloning | Real-time / streaming | Diarization / language ID | Public pricing signal | Analytical reading |
|---|---|---|---|---|---|---|---|
| Google Chirp 3 | Yes, via chirp_3 in Speech-to-Text V2 | Yes, via Chirp 3 HD | Yes, via Instant Custom Voice from ~10 seconds and consent flow | Yes for STT and TTS; Chirp 3 HD uniquely supports text streaming in Cloud TTS | Yes; diarization, language-agnostic transcription, denoiser, adaptation | TTS $30 / 1M chars; STT standard $0.016/min list tier; dynamic batch $0.003/min | Best fit for Google Cloud-native teams that want one vendor for speech generation and transcription. |
| Microsoft Azure AI Speech | Yes | Yes, including HD voices | Yes, via Custom Voice and Personal Voice | Yes; docs include real-time diarization quickstart and broad speech workflows | Yes; official docs highlight language detection, custom speech, diarization | Official pricing page shows per-second STT and per-character TTS billing plus a free tier, but exact post-free rates were not reliably visible in the HTML capture | Strong enterprise alternative, especially where Microsoft identity/compliance stack matters. |
| AWS Polly + Amazon Transcribe | Yes | Yes | TTS customization exists via lexicons and voice families; no instant 10-second clone captured in the source set | Yes; Polly returns audio streams, Transcribe supports streaming | Yes; Transcribe supports diarization and automatic language identification in relevant workflows | Polly Generative $30 / 1M chars; Transcribe Tier 1 $0.03/min in us-east-1 example | Strong for AWS-native teams; TTS price for Polly Generative roughly matches Chirp 3 HD, but STT list price in the cited example is higher. |
| ElevenLabs | Yes, Scribe v2 / v2 Realtime | Yes | Yes, voice cloning front-and-center | Yes; realtime STT marketed at ~150 ms latency | Yes; diarization, word-level timestamps, multilingual handling | Scribe v2 $0.22/hour; Scribe v2 Realtime $0.39/hour; TTS pricing is model- and plan-dependent | Best fit when pure voice experience and rapid productization matter more than hyperscaler platform consolidation. |
Latency and throughput
Specific latency figures for Chirp 3 are not publicly disclosed in the source set. The source records the following latency-related facts:
- Text streaming is exclusive to Chirp 3 HD voices in Google's Cloud TTS stack, which the source describes as relevant for low-latency voice agents.
- The chirp_3 public preview announcement (August 29, 2025) included improved speed/accuracy messaging.
- Adya reported localization across 20+ Indian languages with low latency using Chirp 3.
- A September 2025 forum thread reported long-audio jobs stalling or taking much longer than before; no official public postmortem or universal fix was captured in the source set.
- Instant Custom Voice permits 10 new voice cloning keys per minute per project with no stated absolute limit on total keys.
Deployment and integrations
Chirp 3 is sold as a Google Cloud API service, not a boxed consumer product. Consumption happens through the Cloud TTS API, Speech-to-Text V2 API, Vertex AI Studio, and Google's console and notebook ecosystem.
Region support differs by sub-product: Chirp 3 HD lists six GA endpoints (global, us, eu, asia-southeast1, europe-west2, asia-northeast1), while Chirp 3 Transcription documents GA in us and eu multi-regions with release-note preview expansions into additional regions in late 2025. Instant Custom Voice lists region availability beyond the initial TTS endpoints.
Chirp 3: Transcription is available only in Speech-to-Text V2 and supports the StreamingRecognize, Recognize, and BatchRecognize methods.
Official support paths include reference documentation, release notes with RSS support, Vertex AI Studio and console entry points, Colab and GitHub notebooks, community forums, Cloud support, system status, and sales-led access for allowlisted features such as Instant Custom Voice.
Google describes Gemini-TTS as the latest evolution of Cloud TTS, with broader prompt-based control and native multi-speaker options while reusing voice identities similar to Chirp 3 HD.
Pricing
| Chirp 3 commercial surface | Publicly visible pricing | Availability notes | Source |
|---|---|---|---|
| Chirp 3: HD voices | $30 per 1M characters after the free tier; 1M characters free monthly on the pricing page. | GA; available via Cloud TTS and Vertex AI Studio. | |
| Chirp 3: Instant Custom Voice | Public pricing was not fully surfaced in the captured pricing excerpt; the feature is allowlisted. | Restricted access; requires sales/allowlist. | |
| Chirp 3: Transcription | Google's Speech-to-Text V2 page lists standard recognition at $0.016/min up to 500k min, then lower volume-tier rates, and dynamic batch at $0.003/min; the public pricing page still labels the included V2 speech model family as "chirp" rather than explicitly chirp_3. | Available only in Speech-to-Text V2; GA and preview regions differ. |
The source notes that Google's public pricing nomenclature lags the model nomenclature: the STT pricing page references "chirp (Speech-to-Text V2 only)" while the current model docs and release notes use chirp_3. The source's reading is that Chirp-family Speech-to-Text V2 pricing applies, but the public pricing page is not as current or precise as the model documentation.
Development and ownership
Chirp 3 is developed and operated by Google as part of Google Cloud. The Chirp family predates Chirp 3: Google introduced Chirp as a speech foundation model in 2023 and described that generation as a 2B-parameter speech model delivering 98% English speech recognition accuracy and large relative gains in some low-resource languages. Current Chirp 3 materials emphasize product capabilities, rollout stages, and API behavior rather than architectural disclosures such as parameter count.
Release history
| Date | Milestone | What changed | Source |
|---|---|---|---|
| Feb 10, 2025 | Pre-launch rename | Journey voices were rebranded as Chirp HD voices. | |
| Mar 6, 2025 | TTS rollout expansion | Chirp 3 HD added 8 speakers in 31 locales. | |
| Apr 2, 2025 | TTS GA | Chirp 3 HD voices became GA with 8 speakers / 31 locales, real-time streaming, batch support, and supported regional endpoints. | |
| Apr 9, 2025 | Instant Custom Voice GA announcement | Google announced Instant Custom Voice as GA through an allowlist and also announced transcription with diarization in preview/allowlist. | |
| Apr 11, 2025 | STT Private Preview | chirp_3 launched in private preview for Speech-to-Text V2. | |
| May 7, 2025 | TTS controls expansion | Pace control, pause control, and custom pronunciations were released for Chirp 3 HD voices. | |
| Jun 18, 2025 | ICV locale expansion | Instant Custom Voice added ja-JP, pushing support to more than 30 locales. | |
| Aug 21 to 27, 2025 | ICV and TTS endpoint upgrades | Instant Custom Voice added PCM, MP3, and M4A input encodings; Chirp 3 HD became available on europe-west2. | |
| Aug 29, 2025 | STT Public Preview | chirp_3 public preview launched with 85+ languages/locales in preview coverage and improved speed/accuracy messaging. | |
| Sep 15, 2025 | TTS endpoint expansion | Chirp 3 HD became available on asia-northeast1. | |
| Oct 13, 2025 | STT GA | Chirp 3: Transcription reached GA in Speech-to-Text V2. | |
| Oct 17, 2025 | Limited SSML support | Chirp 3 HD added support for phoneme, p, s, sub, and say-as tags. | |
| Nov to Dec 2025 | Regional and language expansion | STT preview regions expanded; TTS added a wide set of European languages, then Punjabi and Cantonese in preview. |
Sources
All cited sources were accessed on June 15, 2026.
- https://docs.cloud.google.com/speech-to-text/docs/models/chirp-3
- https://docs.cloud.google.com/text-to-speech/docs/release-notes
- https://artificialanalysis.ai/articles/new-streaming-speech-to-text-benchmark-aa-wer-streaming
- https://docs.cloud.google.com/text-to-speech/docs/gemini-tts
- Deeper Smart Sonar CHIRP+ 3: https://deepersonar.com/en-all/products/deeper-chirp-3
- https://cloud.google.com/blog/products/ai-machine-learning/bringing-power-large-models-google-clouds-speech-api
- https://docs.cloud.google.com/text-to-speech/docs/chirp3-hd
- https://docs.cloud.google.com/text-to-speech/docs/chirp3-instant-custom-voice
- https://docs.cloud.google.com/text-to-speech/docs/create-audio-text-streaming
- https://docs.cloud.google.com/text-to-speech/docs/list-voices-and-types
- https://cloud.google.com/blog/products/ai-machine-learning/expanding-generative-media-for-enterprise-on-vertex-ai
- https://docs.cloud.google.com/speech-to-text/docs/release-notes
- https://artificialanalysis.ai/text-to-speech/leaderboard/selected-voice
- https://cloud.google.com/customers/il-foglio
- https://cloud.google.com/customers/adya-ai
- https://cloud.google.com/customers/hbx-group
- https://www.g2.com/products/google-cloud-speech-to-text/reviews
- https://discuss.google.dev/t/google-text-to-speech-only-showing-chirp-voices/184456
- https://discuss.google.dev/t/severe-latency-regression-with-chirp-3-hd-long-audio/262049
- https://discuss.google.dev/t/incorrect-pronunciation-of-french-contractions-with-chirp-3-hd-voices/271804
- https://cloud.google.com/text-to-speech/pricing
- https://cloud.google.com/speech-to-text/pricing
- https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- https://aws.amazon.com/polly/pricing/
- https://elevenlabs.io/pricing/api