OpenTranscription/ Blog
2026-07-03 · MODEL PROFILE

Google command_and_search (Google Speech-to-Text): model profile

Reference profile of Google's command_and_search transcription model in Cloud Speech-to-Text, a legacy short-utterance model for voice commands and voice search.

GoogleCLOUD
Model profile Google

command_and_search is a transcription model inside Google Cloud Speech-to-Text, documented in the V1 API for short or single-word utterances such as voice commands or voice search.

Specifications

DeveloperGoogle (Google Cloud)
ReleasedCloud Speech API open beta July 20, 2016; GA April 18, 2017. Model selection exposing command_and_search: beta April 9, 2018; GA February 20, 2019.
Model typeSpeech-to-text transcription model for short or single-word utterances; classified by Google as "mostly based on classic non-conformer architectures"
LanguagesAll available languages in the Speech-to-Text V1 API
Modes (batch / streaming)Synchronous, asynchronous, and streaming requests; inline audio or Cloud Storage URIs
LatencyNot publicly disclosed. Documented as a short-form model likely to return results as soon as it detects a period of silence.
DeploymentGoogle Cloud Speech-to-Text API (V1); Dialogflow CX and ES integration; Speech UI in the Google Cloud console
PricingListed among the standard recognition models in Google's Speech-to-Text pricing as of June 2026; rates not stated in the source.

Not disclosedParameters · Training data · Throughput / concurrency · License

Full technical breakdown9 sections

Overview

"Command-and-Search" is not a standalone Google product. It is the command_and_search transcription model inside Google's speech-recognition service, which launched as Cloud Speech API, was later renamed Cloud Speech-to-Text, and is now generally presented as Speech-to-Text or Speech-to-Text API v2.

The broader Cloud Speech API entered open beta on July 20, 2016 and became generally available on April 18, 2017. The public ability to select transcription models, the point at which command_and_search became a documented customer-facing model choice, arrived in beta on April 9, 2018 and reached general availability on February 20, 2019.

Google said in 2016 that Cloud Speech API used the same technologies that powered voice recognition in Google Search, Google Now, the Google app's voice search, and Google Keyboard/Gboard voice typing. In 2017 Google added that the service was adapted from those core Google systems to fit enterprise and developer needs.

As of June 2026, Google's V1 documentation still documents command_and_search but classifies it as a legacy model based on classic non-Conformer architecture and recommends latest_short for many short-utterance cases. Google's V2 "Compare transcription models" page foregrounds Chirp and telephony models instead of command_and_search.

Capabilities and features

Google's V1 documentation defines command_and_search as best for short or single-word utterances like voice commands or voice search.

Documented characteristics:

  • Optimized for short audio clips.
  • Classified in Google's troubleshooting documentation as a short-form model, better suited to short audios and prompts, and likely to return results as soon as it detects a period of silence.
  • Spoken punctuation is enabled by default for the command_and_search model.
  • Supports Cloud Speech-to-Text's three request modes: synchronous, asynchronous, and streaming, with inline audio or Cloud Storage URIs.

Documented use cases include brief control or query utterances such as "play next," "weather," "turn up the volume," or a short spoken search query. Google's 2016 and 2018 blog posts gave examples such as smart TVs listening for "rewind" and "fast-forward," apps and IoT devices, call-center routing, and upload-based demos comparing models. Google's 2017 GA post said early adopter use cases clustered around speech as a control method (voice search, voice commands, IVR) and speech analytics.

Model adaptation (phrase sets and custom classes to improve recognition of short commands and domain language) launched in the service between March and May 2021.

Language support

Google's V1 supported-languages page states that the command_and_search model supports all available languages in that API version. At open beta in July 2016, the Cloud Speech API supported 80+ languages; 30 more languages were added on August 10, 2017.

Google's Dialogflow CX documentation lists command_and_search as a useful option, especially for languages where other stronger models are not available.

Performance and benchmarks

Google does not publicly document benchmark word error rates for the command_and_search model.

Third-party and community evaluations of the broader Google speech stack, as reported in the source:

  • G2 review summary: users report struggles with certain accents and background noise.
  • An academic evaluation of off-the-shelf speech recognizers found that performance for non-American accents was considerably worse than for General American speech.
  • A Home Assistant issue in 2024 reported model-support errors for non-English languages.
  • A 2025 Google Developer Forums post on Amharic reported very high WER and stated that Chrome's Web Speech API quality looked better for that use case.

The source states these findings apply to Google's speech stack broadly and are not specific to command_and_search.

On G2, Google Cloud Speech-to-Text had a 4.6/5 score from 237 reviews in the captured page, with the review summary reporting consistent praise for accuracy, ease of use, and integration with other Google services.

Latency and throughput

Numeric latency and throughput figures are not publicly disclosed. Google's troubleshooting documentation classifies command_and_search as a short-form model that is likely to return results as soon as it detects a period of silence.

Deployment and integrations

  • Available through the Cloud Speech-to-Text V1 API with synchronous, asynchronous, and streaming request modes, accepting inline audio or Cloud Storage URIs.
  • Integrated with Dialogflow CX and ES; Google states that Dialogflow voice agents use Speech-to-Text for recognition and lists command_and_search as an option, especially for languages where other models are not available.
  • A Speech UI in the Google Cloud console allows experimentation with models and configurations.

Official documentation entry points cited in the source:

Pricing

As of June 2026, Google lists command_and_search among the standard recognition models in its Speech-to-Text pricing pages. Specific rates are not stated in the source.

Google's 2018 overhaul introduced opt-in data logging for enhanced models; TechCrunch noted that some customers would not be comfortable sharing data even in exchange for lower prices. The 2019 model-selection GA release was accompanied by price cuts, per TechCrunch coverage.

Development and ownership

The model is developed and operated by Google as part of Google Cloud Speech-to-Text. Google does not publish a single authoritative creator roster for the command_and_search production model.

Product leads named in launch posts:

  • Apoorv Saxena, Product Manager, Cloud AI: authored the July 2016 open-beta announcement.
  • Dan Aharon, Product Manager, Speech: authored the April 2017 GA post and the April 2018 overhaul that references the existing "voice command_and_search" model.
  • Francoise Beaufays, Distinguished Scientist, Speech Team: authored the April 2022 post introducing the Conformer-based "latest" models.
  • Calum Barnes and Haris Ioannou, Product Managers, Cloud Speech: authored the August 2023 V2/Chirp GA announcement.

Research lineage named in the source: Google publications by Ciprian Chelba, Johan Schalkwyk, Carolina Parada, Chung-Cheng Chiu, James Qin, Yu Zhang, Weiran Wang, Rohit Prabhavalkar, Tara Sainath, and others on voice search language models, endpoint detection, Conformer, Universal Speech Model, and large short-query ASR systems. Specific works cited: the 2011 "Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice" talk, the 2016 "On-Demand Language Model Interpolation for Mobile Speech Input" paper, 2017 endpoint-detection papers for voice search and Google Home, the 2020 Conformer paper, the 2023 Universal Speech Model blog (trained on 12 million hours of speech and 28 billion sentences of text), and the 2023 paper "Massive End-to-end Models for Short Search Queries."

Adoption figures for the overall Speech-to-Text service (not specific to command_and_search): more than 5,000 companies signed up during alpha in 2016; thousands of customers by GA in 2017; usage more than doubling every six months in 2018; more than 1 billion voice minutes processed per month by 2022 and 2023; thousands of customers served in 2023.

Release history

Date Milestone Details
July 20, 2016 Cloud Speech API open beta Google opened speech recognition to developers in 80+ languages and said it used the same voice-recognition tech behind Google Search and Google Now; Google also said 5,000+ companies had signed up in alpha.
April 18, 2017 Cloud Speech API GA Google said the service was built on the core speech-recognition technology used by Google Search, Google Now, and Google Assistant, but adapted for cloud customers.
August 10, 2017 Timestamps, 30 more languages, longer async audio Maturation step for enterprise transcription, though not specific to command_and_search.
April 9, 2018 Major overhaul and rename to Cloud Speech-to-Text Google introduced model selection in beta, automatic punctuation, recognition metadata, and an SLA, and said it would continue offering its existing model for "voice command_and_search." This is the earliest unmistakable public documentation of command_and_search as part of the selectable model family.
February 20, 2019 Model selection GA Google made transcription-model selection generally available, alongside enhanced models and data logging GA. This is the strongest stable public availability date for command_and_search as a documented model choice.
March to May 2021 Model adaptation launched, then GA Added phrase sets and custom classes to improve recognition of short commands and domain language.
May to July 2021 Spoken punctuation and spoken emoji preview, then GA Spoken punctuation later documented as enabled by default for command_and_search.
April 21 to 22, 2022 "Latest" models launched Google introduced newer end-to-end Conformer-based models and positioned latest_short as the improved modern path for command-like utterances.
August 9, 2023 Speech-to-Text V2 and Chirp GA Google modernized the API, added regionalization, and said V2 migrated existing functionality while opening the door to newer large speech models like Chirp.
November 6, 2023 telephony and telephony_short launched Newer specialized models for short-form speech in the product roadmap.
January 9, 2024 latest_short quality substantially improved Reinforced the modern replacement path for short utterances.
October 13, 2025 Chirp 3 GA Google's flagship speech offering shifted toward generative/multilingual Chirp models in V2.

In 2022, Google contrasted the older approach, built from separate acoustic, pronunciation, and language models, with the new Conformer-based architecture using a single neural network combining transformer-style context modeling with convolutions for local information. Google's V1 model documentation and "Introduction to Latest Models" page say to consider latest_short instead of command_and_search for short utterances.

Sources

The platform

Put these benchmarks to work

The same evaluations behind these dispatches drive OpenTranscription — one API that routes every job to the right speech model for your audio, language, and budget.

© 2026 OpenTranscription · Signal is our journal.Set in system grotesque, serif & mono