Google command_and_search (Google Speech-to-Text): model profile

command_and_search is a transcription model inside Google Cloud Speech-to-Text, documented in the V1 API for short or single-word utterances such as voice commands or voice search.

Specifications

Developer	Google (Google Cloud)
Released	Cloud Speech API open beta July 20, 2016; GA April 18, 2017. Model selection exposing command_and_search: beta April 9, 2018; GA February 20, 2019.
Model type	Speech-to-text transcription model for short or single-word utterances; classified by Google as "mostly based on classic non-conformer architectures"
Languages	All available languages in the Speech-to-Text V1 API
Modes (batch / streaming)	Synchronous, asynchronous, and streaming requests; inline audio or Cloud Storage URIs
Latency	Not publicly disclosed. Documented as a short-form model likely to return results as soon as it detects a period of silence.
Deployment	Google Cloud Speech-to-Text API (V1); Dialogflow CX and ES integration; Speech UI in the Google Cloud console
Pricing	Listed among the standard recognition models in Google's Speech-to-Text pricing as of June 2026; rates not stated in the source.

Not disclosedParameters · Training data · Throughput / concurrency · License

Known limitations

Google's V1 model-selection page classifies command_and_search among models "mostly based on classic non-conformer architectures" kept primarily for legacy and backwards-compatibility reasons.
Google does not publicly document the exact production architecture, parameter count, training data composition, or benchmark WER for this specific model.
Google does not publicly provide an exact first release date for the command_and_search identifier before model selection became public, a full creator list, or a formal statement on whether the model is legacy, softly deprecated, or intended for indefinite compatibility support. The public release notes reviewed do not show a formal deprecation notice, but current V2 comparison pages do not foreground the model.
Third-party reports on the broader Google speech stack describe struggles with certain accents and background noise (G2), considerably worse performance for non-American accents than for General American speech (academic evaluation), model-support errors for non-English languages (Home Assistant issue, 2024), and very high WER for Amharic (Google Developer Forums, 2025). These findings are not specific to command_and_search.
Developers have reported behavior regressions when switching short-utterance models from command_and_search to latest_short, especially in voice-bot settings.
Google developer-forum guidance indicates that the model behind Chrome's Web Speech API is not publicly available and cannot be selected in Cloud Speech-to-Text.

Full technical breakdown9 sections

Overview

"Command-and-Search" is not a standalone Google product. It is the command_and_search transcription model inside Google's speech-recognition service, which launched as Cloud Speech API, was later renamed Cloud Speech-to-Text, and is now generally presented as Speech-to-Text or Speech-to-Text API v2.

The broader Cloud Speech API entered open beta on July 20, 2016 and became generally available on April 18, 2017. The public ability to select transcription models, the point at which command_and_search became a documented customer-facing model choice, arrived in beta on April 9, 2018 and reached general availability on February 20, 2019.

Google said in 2016 that Cloud Speech API used the same technologies that powered voice recognition in Google Search, Google Now, the Google app's voice search, and Google Keyboard/Gboard voice typing. In 2017 Google added that the service was adapted from those core Google systems to fit enterprise and developer needs.

As of June 2026, Google's V1 documentation still documents command_and_search but classifies it as a legacy model based on classic non-Conformer architecture and recommends latest_short for many short-utterance cases. Google's V2 "Compare transcription models" page foregrounds Chirp and telephony models instead of command_and_search.

Capabilities and features

Google's V1 documentation defines command_and_search as best for short or single-word utterances like voice commands or voice search.

Documented characteristics:

Optimized for short audio clips.
Classified in Google's troubleshooting documentation as a short-form model, better suited to short audios and prompts, and likely to return results as soon as it detects a period of silence.
Spoken punctuation is enabled by default for the command_and_search model.
Supports Cloud Speech-to-Text's three request modes: synchronous, asynchronous, and streaming, with inline audio or Cloud Storage URIs.

Documented use cases include brief control or query utterances such as "play next," "weather," "turn up the volume," or a short spoken search query. Google's 2016 and 2018 blog posts gave examples such as smart TVs listening for "rewind" and "fast-forward," apps and IoT devices, call-center routing, and upload-based demos comparing models. Google's 2017 GA post said early adopter use cases clustered around speech as a control method (voice search, voice commands, IVR) and speech analytics.

Model adaptation (phrase sets and custom classes to improve recognition of short commands and domain language) launched in the service between March and May 2021.

Language support

Google's V1 supported-languages page states that the command_and_search model supports all available languages in that API version. At open beta in July 2016, the Cloud Speech API supported 80+ languages; 30 more languages were added on August 10, 2017.

Google's Dialogflow CX documentation lists command_and_search as a useful option, especially for languages where other stronger models are not available.

Performance and benchmarks

Google does not publicly document benchmark word error rates for the command_and_search model.

Third-party and community evaluations of the broader Google speech stack, as reported in the source:

G2 review summary: users report struggles with certain accents and background noise.
An academic evaluation of off-the-shelf speech recognizers found that performance for non-American accents was considerably worse than for General American speech.
A Home Assistant issue in 2024 reported model-support errors for non-English languages.
A 2025 Google Developer Forums post on Amharic reported very high WER and stated that Chrome's Web Speech API quality looked better for that use case.

The source states these findings apply to Google's speech stack broadly and are not specific to command_and_search.

On G2, Google Cloud Speech-to-Text had a 4.6/5 score from 237 reviews in the captured page, with the review summary reporting consistent praise for accuracy, ease of use, and integration with other Google services.

Latency and throughput

Numeric latency and throughput figures are not publicly disclosed. Google's troubleshooting documentation classifies command_and_search as a short-form model that is likely to return results as soon as it detects a period of silence.

Deployment and integrations

Available through the Cloud Speech-to-Text V1 API with synchronous, asynchronous, and streaming request modes, accepting inline audio or Cloud Storage URIs.
Integrated with Dialogflow CX and ES; Google states that Dialogflow voice agents use Speech-to-Text for recognition and lists command_and_search as an option, especially for languages where other models are not available.
A Speech UI in the Google Cloud console allows experimentation with models and configurations.

Official documentation entry points cited in the source:

Product page: https://cloud.google.com/speech-to-text
Release notes: https://docs.cloud.google.com/speech-to-text/docs/release-notes
V1 model selection: https://docs.cloud.google.com/speech-to-text/docs/v1/transcription-model
V1 supported languages: https://docs.cloud.google.com/speech-to-text/docs/v1/speech-to-text-supported-languages
V1 RecognitionConfig reference: https://docs.cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig
Dialogflow CX speech models: https://docs.cloud.google.com/dialogflow/cx/docs/concept/speech-models
V2 model comparison: https://docs.cloud.google.com/speech-to-text/docs/transcription-model

Pricing

As of June 2026, Google lists command_and_search among the standard recognition models in its Speech-to-Text pricing pages. Specific rates are not stated in the source.

Google's 2018 overhaul introduced opt-in data logging for enhanced models; TechCrunch noted that some customers would not be comfortable sharing data even in exchange for lower prices. The 2019 model-selection GA release was accompanied by price cuts, per TechCrunch coverage.

Development and ownership

The model is developed and operated by Google as part of Google Cloud Speech-to-Text. Google does not publish a single authoritative creator roster for the command_and_search production model.

Product leads named in launch posts:

Apoorv Saxena, Product Manager, Cloud AI: authored the July 2016 open-beta announcement.
Dan Aharon, Product Manager, Speech: authored the April 2017 GA post and the April 2018 overhaul that references the existing "voice command_and_search" model.
Francoise Beaufays, Distinguished Scientist, Speech Team: authored the April 2022 post introducing the Conformer-based "latest" models.
Calum Barnes and Haris Ioannou, Product Managers, Cloud Speech: authored the August 2023 V2/Chirp GA announcement.

Research lineage named in the source: Google publications by Ciprian Chelba, Johan Schalkwyk, Carolina Parada, Chung-Cheng Chiu, James Qin, Yu Zhang, Weiran Wang, Rohit Prabhavalkar, Tara Sainath, and others on voice search language models, endpoint detection, Conformer, Universal Speech Model, and large short-query ASR systems. Specific works cited: the 2011 "Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice" talk, the 2016 "On-Demand Language Model Interpolation for Mobile Speech Input" paper, 2017 endpoint-detection papers for voice search and Google Home, the 2020 Conformer paper, the 2023 Universal Speech Model blog (trained on 12 million hours of speech and 28 billion sentences of text), and the 2023 paper "Massive End-to-end Models for Short Search Queries."

Adoption figures for the overall Speech-to-Text service (not specific to command_and_search): more than 5,000 companies signed up during alpha in 2016; thousands of customers by GA in 2017; usage more than doubling every six months in 2018; more than 1 billion voice minutes processed per month by 2022 and 2023; thousands of customers served in 2023.

Release history

Date	Milestone	Details
July 20, 2016	Cloud Speech API open beta	Google opened speech recognition to developers in 80+ languages and said it used the same voice-recognition tech behind Google Search and Google Now; Google also said 5,000+ companies had signed up in alpha.
April 18, 2017	Cloud Speech API GA	Google said the service was built on the core speech-recognition technology used by Google Search, Google Now, and Google Assistant, but adapted for cloud customers.
August 10, 2017	Timestamps, 30 more languages, longer async audio	Maturation step for enterprise transcription, though not specific to command_and_search.
April 9, 2018	Major overhaul and rename to Cloud Speech-to-Text	Google introduced model selection in beta, automatic punctuation, recognition metadata, and an SLA, and said it would continue offering its existing model for "voice command_and_search." This is the earliest unmistakable public documentation of command_and_search as part of the selectable model family.
February 20, 2019	Model selection GA	Google made transcription-model selection generally available, alongside enhanced models and data logging GA. This is the strongest stable public availability date for command_and_search as a documented model choice.
March to May 2021	Model adaptation launched, then GA	Added phrase sets and custom classes to improve recognition of short commands and domain language.
May to July 2021	Spoken punctuation and spoken emoji preview, then GA	Spoken punctuation later documented as enabled by default for command_and_search.
April 21 to 22, 2022	"Latest" models launched	Google introduced newer end-to-end Conformer-based models and positioned latest_short as the improved modern path for command-like utterances.
August 9, 2023	Speech-to-Text V2 and Chirp GA	Google modernized the API, added regionalization, and said V2 migrated existing functionality while opening the door to newer large speech models like Chirp.
November 6, 2023	telephony and telephony_short launched	Newer specialized models for short-form speech in the product roadmap.
January 9, 2024	latest_short quality substantially improved	Reinforced the modern replacement path for short utterances.
October 13, 2025	Chirp 3 GA	Google's flagship speech offering shifted toward generative/multilingual Chirp models in V2.

In 2022, Google contrasted the older approach, built from separate acoustic, pronunciation, and language models, with the new Conformer-based architecture using a single neural network combining transformer-style context modeling with convolutions for local information. Google's V1 model documentation and "Introduction to Latest Models" page say to consider latest_short instead of command_and_search for short utterances.

Sources

Introducing Cloud Natural Language API, Speech API open beta and our West Coast region expansion, Google Cloud Blog. https://cloud.google.com/blog/products/gcp/the-latest-for-cloud-customers-machine-learning-and-west-coast-expansion/
Cloud Speech-to-Text V1 supported languages, Google Cloud Documentation. https://docs.cloud.google.com/speech-to-text/docs/v1/speech-to-text-supported-languages
Cloud Speech API is now generally available, Google Cloud Blog. https://cloud.google.com/blog/products/gcp/cloud-speech-api-is-now-generally-available
Speech-to-Text: AI voice typing and transcription, Google Cloud. https://cloud.google.com/speech-to-text
Speech-to-Text release notes, Google Cloud Documentation. https://docs.cloud.google.com/speech-to-text/docs/release-notes
Toward better phone call and video transcription with new Cloud Speech-to-Text, Google Cloud Blog. https://cloud.google.com/blog/products/gcp/toward-better-phone-call-and-video-transcription-with-new-cloud-speech-to-text
Google Cloud Speech-to-Text V2 API, Google Cloud Blog. https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-speech-to-text-v2-api
Google takes Cloud Machine Learning service mainstream, Google Cloud Blog. https://cloud.google.com/blog/products/gcp/google-takes-cloud-machine-learning-service-mainstream/
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice, Google Research. https://research.google/pubs/language-modeling-for-automatic-speech-recognition-meets-the-web-google-search-by-voice-2/
Conformer: Convolution-augmented Transformer for Speech Recognition, Google Research. https://research.google/pubs/conformer-convolution-augmented-transformer-for-speech-recognition/
Select a transcription model, Cloud Speech-to-Text, Google Cloud Documentation. https://docs.cloud.google.com/speech-to-text/docs/v1/transcription-model
Compare transcription models, Cloud Speech-to-Text. https://docs.cloud.google.com/speech-to-text/docs/transcription-model
Speech-to-Text API Pricing, Google Cloud. https://cloud.google.com/speech-to-text/pricing
Google Cloud Speech API Now Available to Developers, Voicebot.ai. https://voicebot.ai/2017/04/19/google-cloud-speech-api-now-available-developers/
Google Cloud Speech-to-Text Reviews 2026: Details, Pricing, and Features, G2. https://www.g2.com/products/google-cloud-speech-to-text/reviews
Experience Google's machine learning on your own images, voice and text, Google Cloud Blog. https://cloud.google.com/blog/products/gcp/experience-googles-machine-learning-on-your-own-images-voice-and-text