Amazon Transcribe Medical: model profile

Amazon Transcribe Medical is AWS's managed medical automatic speech recognition service for converting clinician dictation and clinician-patient speech into text.

Specifications

Developer	Amazon Web Services (AWS)
Released	December 2019
Model type	Deep-learning-based automatic speech recognition service optimized for medical speech; exact architecture not disclosed
Languages	US English (en-US) only
Modes (batch / streaming)	Both; real-time streaming and batch transcription, with DICTATION and CONVERSATION audio types
Latency	Not publicly disclosed. AWS publishes no latency service objectives
Deployment	Managed AWS API via console, API calls, AWS CLI, and AWS SDKs; 12 commercial regions plus AWS GovCloud West
Pricing	AWS pricing page examples imply about $0.075 per minute, with a 60-minute monthly free tier for the first 12 months
License	Not publicly disclosed. Delivered as a proprietary managed AWS service

Not disclosedParameters · Training data · Throughput / concurrency

Known limitations

Limitation or failure mode	Why it matters	Mitigation (per source)
US-English only	Limits international or multilingual clinical use	Use separate multilingual ASR/translation stacks, or evaluate open/self-hosted alternatives for non-US-English workflows
Noise, overlap, accents, and code-switching reduce accuracy	Can materially affect real-world visit transcription quality	Use higher-quality microphones, channel-separated capture where possible, Chime SDK active-talker splitting, and human review
PHI identification is not HIPAA de-identification	Redaction workflows can fail if treated as automatic de-identification	Use PHI tagging as a first pass only; add review or dedicated de-identification controls
Speaker diarization linearizes overlap and may delay stable speaker labels in streaming	Speaker attribution can be wrong or late around interruptions	Prefer multi-channel audio when feasible; review speaker assignments in post-processing
Medical custom vocabulary cannot contain PHI/PII and large vocabularies are discouraged	Governance and vocabulary design affect accuracy and privacy	Build small, encounter-specific or specialty-specific vocabularies with strict curation
No public custom medical language model training path in reviewed docs	Lower ceiling for customer-specific language adaptation than some alternatives	Combine custom vocabulary with specialty routing, downstream correction, or consider open/self-trained models
No turnkey note generation in base product	Additional engineering needed for ambient documentation	Use HealthScribe or a Bedrock-based note layer if the requirement is note generation rather than transcript only

Undisclosed information

AWS does not publicly disclose Transcribe Medical's exact model family, medical training data sources or size, specialty-by-specialty benchmark scores, latency service objectives, or internal model version history. The public record also does not expose a complete service-team roster beyond blog authors and research contributors. AWS does not publish a public Transcribe Medical model card, WER benchmark suite, specialty-by-specialty scorecard, or latency SLO.

Full technical breakdown9 sections

Overview

Amazon Transcribe Medical launched in December 2019 as a HIPAA-eligible capability of Amazon Transcribe, with real-time streaming at launch. Batch transcription was added in April 2020, custom medical vocabularies later that month, specialty expansion in late 2020, multi-channel support in December 2020, and automatic PHI identification in January 2021. As of June 2026, the publicly documented product remains a transcription-focused API service rather than a full ambient documentation agent; AWS positions AWS HealthScribe as the higher-level note-generation offering for clinical documentation workflows.

The service is an AWS API for US English medical speech transcription. AWS documents support for real-time streaming and batch transcription, two main audio modes (DICTATION and CONVERSATION), primary care plus multiple specialty-care domains, timestamps, confidence scores, alternative transcriptions, speaker diarization, channel identification, medical custom vocabularies, and PHI tagging. AWS positions it for clinical documentation, pharmacovigilance call review, telehealth subtitling, and healthcare contact-center scenarios.

AWS describes the service as "deep learning" and "state-of-the-art machine learning" based, but does not publish standardized word-error-rate benchmarks, model version numbers, latency targets, or the internal architecture of the managed service.

Capabilities and features

Capability	Publicly documented status	Notes
Real-time transcription	Supported	StartMedicalStreamTranscription starts a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out.
Batch transcription	Supported	StartMedicalTranscriptionJob handles uploaded medical dictation or conversation files.
Audio types	Supported	AWS requires a Type such as DICTATION or CONVERSATION.
Language support	Limited	AWS FAQ says Transcribe Medical currently supports US English only.
Specialty support	Supported	Product page lists primary care plus cardiology, neurology, OB-GYN, pediatrics, oncology, radiology, and urology.
Medical custom vocabularies	Supported	Users can upload table-format vocabularies with IPA pronunciations and display forms.
Alternative transcriptions	Supported	Batch jobs can return 2 to 10 alternatives.
Word timestamps and confidence	Supported	Documented at launch and in API output.
Speaker diarization	Supported	AWS labels speakers and supports streaming plus batch diarization.
Channel identification	Supported	Added for both streaming and batch multi-channel audio in December 2020.
PHI identification	Supported	Added in January 2021 at no extra charge.
Private connectivity	Supported	AWS PrivateLink support for real-time streaming was announced in June 2020.
Chime SDK integration	Supported	Live transcription can be integrated via Amazon Chime SDK, including specialty and conversation type selection.
Clinical note generation	Not native to Transcribe Medical	AWS directs users needing a single note-generation API toward AWS HealthScribe.

Customization depth

Transcribe Medical supports medical custom vocabularies, but in the public sources reviewed AWS does not document a customer-trainable custom medical language model analogous to standard Amazon Transcribe custom language models. AWS's custom language model FAQ is framed around standard Transcribe rather than Transcribe Medical, while the medical documentation emphasizes vocabularies.

Architecture: documented facts and adjacent research

AWS confirms that Transcribe Medical is a deep-learning-based ASR service with automatic punctuation and capitalization, specialty-aware transcription, dictation and conversational modes, custom medical vocabularies, speaker and channel logic, and optional PHI identification. It documents a stateless service posture and an API-first delivery model. It does not publish the acoustic model family, decoder design, training corpus size, language-model design, or a release-by-release model identifier history.

The source notes that Amazon Science publications show AWS speech teams working on related methods, and that AWS does not state that any one paper maps one-to-one onto the production Transcribe Medical stack.

Technology area	What AWS documents for Transcribe Medical	What AWS public research suggests	Assessment (per source)
Core ASR model	"Deep learning" / "state-of-the-art machine learning" medical ASR.	Amazon speech teams publish on CTC, neural transducers, and context-aware transformer transducers for production ASR.	High confidence that the service uses modern end-to-end ASR; low confidence on the exact architecture because AWS does not disclose it.
Language modeling and rare-term handling	Supports specialty selection and medical custom vocabularies.	AWS papers describe contextual biasing, semantic/acoustic biasing, and knowledge-graph support for out-of-vocabulary entities, including medical terminology.	Strong evidence that rare-term biasing is a major design theme; exact LM design for Transcribe Medical is not public.
Domain adaptation	API requires Specialty and Type; AWS expanded specialty coverage over time.	AWS has published post-training domain adaptation methods using synthetic acoustic catalogs and KNN fusion.	Strong evidence of domain-conditioned decoding/modeling, though whether this appears as separate specialty models or lighter adaptation is undisclosed.
Noise and acoustic robustness	AWS FAQ says Transcribe is designed for variation in volume, pitch, and speaking rate, but noise, overlap, accents, and code-switching can degrade output.	No medical-specific public paper clearly documents the production front-end denoising stack.	Public documentation is enough to know limits, not enough to reverse-engineer the front end.
Punctuation and casing	Automatic punctuation and capitalization are part of the launch and product positioning.	AWS medical ASR paper uses BERT/BioBERT/RoBERTa for punctuation and truecasing, with domain adaptation and augmentation.	Very likely that punctuation/truecasing is a distinct downstream stage or integrated module.
Speaker diarization	Documented for streaming and batch; output includes speaker labels; overlapping speech is linearized by start time.	AWS research focuses on reducing speaker errors with audio-grounded lexical correction.	Public docs describe the interface; research suggests active work on improving turn-attribution around overlaps.
Privacy-preserving learning	AWS says medical customer content is not used to improve AWS AI technologies.	AWS also publishes privacy-preserving continual-learning work using ephemeral, weakly supervised data in production ASR.	Suggests AWS has internal methods for model refresh under privacy constraints, but not necessarily on medical customer data.

Language support

Transcribe Medical is documented only for US English (en-US) medical transcription. AWS's public product page lists transcription support for primary care and specialty areas including cardiology, neurology, obstetrics-gynecology, pediatrics, oncology, radiology, and urology. The documentation page for "Medical specialties and terms" describes PRIMARYCARE as covering family medicine, internal medicine, OB-GYN, and pediatrics.

Performance and benchmarks

Vendor-reported: AWS does not publish standardized word-error-rate benchmarks, a public model card for Transcribe Medical, a specialty-by-specialty scorecard, or latency service objectives.

Third-party evaluation: a 2024 JAMIA Open study reported that AWS Medical outperformed AWS General on medical proper nouns, while also finding disparities in performance across speech from Black and White patients and persistent difficulty with spontaneous conversational phenomena. A 2023 digital-scribe comparison observed that word-diarization error differed little across speakers in most models, but Amazon Medical Conversation ASR showed a larger clinician-side gap in that study's setup.

The source states that a quantitative accuracy-versus-latency chart across AWS, Google, and Nuance would be misleading because the vendors do not publish directly comparable medical-ASR benchmark suites with normalized latency methodology. It provides the following capability and delivery comparison, which it describes as an analytical inference from public delivery models and documented feature depth, not a vendor-provided benchmark.

Competitor	Delivery model	Medical specialization	Customization	Public pricing signal	Comparative read versus AWS (per source)
Amazon Transcribe Medical	Managed AWS API	Yes, medical-specific transcription	Medical custom vocabularies; specialty and type selection	AWS worked examples imply about $0.075/min with a 60-minute monthly free tier for first 12 months.	Strong developer fit, wide AWS integration, limited public transparency, transcription-first rather than workflow-first
Google Cloud Speech-to-Text medical models	Managed cloud API	Yes, separate medical dictation and medical conversation models	Alternate transcriptions, timestamps, confidence; conversation diarization; dictation spoken punctuation/formatting/headings	$0.078/min after first 60 free minutes per month.	Very similar API-layer competitor; slightly higher public list price; strong documentation for dictation formatting behaviors
Dragon Medical One	Clinician-facing documentation software	Yes, purpose-built clinical documentation product	Extensive end-user vocabulary, commands, templates, workflow features	Public price not clearly exposed in the reviewed official pages; licensing/sales-led procurement	Stronger ready-made clinical workflow and EHR ergonomics; weaker as a simple developer API building block
Azure Speech plus Microsoft healthcare stack	General cloud speech platform plus Nuance products	Public docs position healthcare as a use case, but Microsoft's healthcare-specific speech story is mostly Dragon/Dragon Copilot	Custom speech and general speech platform tooling	Official page clearly exposes free tier structure and per-second billing, but exact paid rates were not recoverable from the static pricing HTML reviewed.	If you want Microsoft-native general speech plus customization, Azure fits; if you want healthcare-specialized voice, Microsoft steers customers to Dragon
Open-source Whisper	Self-hosted model/software	No, general-purpose	Full deployment control, but no managed medical workflow	Infra cost only	Excellent flexibility and broad robustness, but customer owns validation, security, compliance, and medical adaptation
Open-source Parakeet	Self-hosted/open-source model	No dedicated medical specialization in the reviewed source	Full deployment control; punctuation and timestamps	Infra cost only	Attractive for performance and openness, but requires significant speech MLOps
Open MedASR	Open medical model	Yes, medical dictation/transcription	Fine-tunable health-domain model	Infra cost only	Most directly analogous open alternative for medical dictation, but still not a managed HIPAA-ready service by itself

Latency and throughput

AWS does not publish latency targets or a latency SLO for Transcribe Medical. Real-time streaming operates through StartMedicalStreamTranscription, a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out. The August 2021 Amazon Chime SDK live transcription integration is documented as a lower-latency meeting use case. Throughput and concurrency figures are not publicly disclosed.

Deployment and integrations

The service is available through AWS console workflows, API calls, AWS CLI, and AWS SDKs. Public API references and FAQs show the medical APIs alongside the broader Transcribe service family, with Boto3 examples for custom vocabulary creation and REST-style operation references for jobs and streams.

AWS's endpoint documentation lists Transcribe Medical endpoints in 12 commercial regions plus AWS GovCloud West, including US East North Virginia and Ohio, US West Northern California and Oregon, Canada Central, Europe Ireland, London, and Frankfurt, and Asia Pacific Seoul, Singapore, Sydney, and Tokyo.

Documented integration patterns include Amazon Comprehend Medical, Twilio Media Streams, Veritas telehealth review workflows, and Amazon Chime SDK, plus downstream AWS services such as HealthLake, S3, Athena, and Bedrock in customer-built pipelines.

Security and compliance

Transcribe Medical is described by AWS as HIPAA-eligible, available under AWS's Business Associate Addendum, and subject to the AWS shared responsibility model. AWS states that BAA customers must encrypt PHI at rest and in transit, and that customers remain responsible for correct service configuration and lawful use.

The medical FAQ is stricter than the general Transcribe FAQ. The general Transcribe FAQ says content may be stored and used to provide, maintain, improve, and develop Amazon Transcribe and related AI technologies unless customers opt out. AWS says Amazon Transcribe Medical does not use content processed by the service for any purpose other than to provide and maintain the service, and does not use that content to improve Amazon Transcribe Medical or other Amazon AI technologies. The product page describes the service as stateless: it neither stores inbound audio nor output text, and leaves storage choices to the customer.

Consideration	AWS public position	Practical implication (per source)
HIPAA eligibility	Yes.	Useful for PHI workflows, but only with a BAA and compliant architecture around the service.
BAA and encryption duties	AWS says BAA customers must encrypt PHI at rest and in transit.	Security controls remain partly customer-owned.
Data retention stance	Product page says stateless; FAQ says medical content is not used to improve AWS AI.	Stronger privacy posture than standard Transcribe, at least in public documentation.
PHI identification	Available at no additional charge.	Helps redaction workflows, but is not a substitute for full de-identification review.
PHI de-identification	AWS explicitly warns PHI identification may not accurately identify PHI in all circumstances and does not satisfy HIPAA de-identification requirements.	Human review or separate de-identification controls are still required.
Custom vocabulary content	AWS says do not include PII or PHI in medical custom vocabularies.	Customers need governance for vocabulary curation.
Private networking	PrivateLink for real-time streaming is available.	Reduces exposure to the public internet and fits stricter network topologies.
Region choice	Multiple commercial regions plus GovCloud West are documented.	Supports residency and procurement choices, but end-to-end residency depends on all connected services.

AWS documentation states that Transcribe Medical is not a substitute for professional medical advice, diagnosis, or treatment, and that users should apply confidence thresholds and human review where accuracy needs are high.

Pricing

AWS's static pricing page examples imply a medical transcription rate of about $0.075 per minute, with a 60-minute monthly free tier for the first 12 months. For comparison, the source reports Google's official medical Speech-to-Text pricing at $0.078 per minute after its own first 60 free minutes each month. PHI identification is available at no additional charge.

Development and ownership

Transcribe Medical is developed by Amazon Web Services. The source describes it as sitting at the intersection of productized AWS AI services (AWS AI / AWS Machine Learning) and the broader Amazon Science speech-research program. AWS has not published a full engineering roster for the service; the following people and organizations are tied to it in the public record.

Publicly identified person or org	Role in the public record	Relevance
Vasi Philomin	GM for Machine Learning and AI at AWS; launch blog author	Public launch sponsor/executive owner across AWS language services in 2019
Paul Zhao	Product Manager at AWS Machine Learning managing Amazon Transcribe	Direct product-facing owner named in Transcribe Medical blog materials
Katrin Kirchhoff	Senior Manager and Principal Scientist at AWS AI in 2020; later described as Director of Speech Processing for AWS; affiliated with AWS AI Labs in research literature	Key public research leader for AWS speech technologies relevant to Transcribe
Scott Seyfarth	Data Scientist at AWS AI working on improving Amazon Transcribe and Transcribe Medical	Directly tied to service improvement in public author bios
Ruoyu Huang	Software Development Engineer at Amazon Transcribe	Publicly named engineering contributor on Transcribe Medical customization work
AWS AI / AWS Machine Learning / Amazon Science speech teams	Product and research organizations behind AWS language and speech services	The most visible institutions behind the service
Cerner, Amgen, SoundLines/HealthChannels	Early public customers or quoted adopters	Evidence of early industry uptake in EHR, pharmacovigilance, and care-team workflows

Adjacent research and patents

The source states these papers and patents should be treated as adjacent technical evidence, not as official reverse-engineering of the production Transcribe Medical service.

Type	Source	Short summary
Paper	Robust prediction of punctuation and truecasing for medical ASR	AWS medical-ASR paper using pretrained masked language models and medical-domain adaptation for punctuation/truecasing; especially relevant to dictation usability
Paper	Listen, Know and Spell	Shows AWS AI interest in knowledge-graph infusion for OOV named entities in domains such as medical ASR
Blog plus paper pointer	Teaching speech recognizers new words without retraining	Explains contextual adapters and decoder biasing for difficult named entities; cites strong gains on medical terminology
Paper	Domain adaptation with external off-policy acoustic catalogs	Describes scalable post-training ASR adaptation using synthetic acoustic catalogs and KNN fusion; relevant to rare-domain adaptation
Paper	ILASR	Privacy-preserving incremental-learning framework for production ASR, relevant to how AWS could update speech models without relying on sensitive customer data
Paper	AG-LSEC	Improves speaker diarization by grounding lexical speaker correction in acoustics; relevant to medical conversation turn attribution
Paper	Context-aware Transformer transducer	Strong evidence that Amazon speech teams use advanced transducer architectures for rare-word/context-sensitive ASR
Patent	Contextual biasing for speech recognition	Amazon patent family on bias encoders and bias attention for rare/contextual phrases; highly relevant to specialized terminology support
Patent	Infusing knowledge graphs into automatic speech recognition	Patent on injecting domain knowledge such as medications, diseases, and drugs into ASR
Patent	Using recurrent neural network for partitioning of audio and speaker diarization	Amazon patent-family evidence around diarization plus ASR concurrency and segmentation

Release history

AWS's public history for Transcribe Medical is feature-oriented rather than version-oriented. Customers can reconstruct major milestones from launch posts, docs, and "What's New" announcements, but AWS does not publicly expose a numbered model lineage, model cards for Transcribe Medical itself, or a release log with benchmark deltas. The timeline below is compiled from AWS launch posts and official "What's New" announcements.

Date	Milestone
December 2019	Launch as a HIPAA-eligible capability of Amazon Transcribe with real-time streaming, word timestamps, confidence scores, and punctuation/capitalization
April 2020	Batch transcription of medical audio files added
April 2020	Custom medical vocabularies added, with IPA pronunciation support, display forms, and batch plus streaming support
June 2020	AWS PrivateLink support for real-time streaming announced
November 2020	Streaming transcription support for cardiology, oncology, neurology, radiology, and urology specialties
December 2020	Multi-channel support for both streaming and batch transcription
January 2021	Automatic protected health information (PHI) identification added at no extra charge
August 2021	Amazon Chime SDK live transcription integration, including specialty and conversation type selection

Adoption evidence at and after launch: Cerner said it was developing a digital voice scribe on top of Transcribe Medical; Amgen cited use in pharmacovigilance call review; SoundLines/HealthChannels described using the API in care-team and analytics workflows. Healthcare Dive wrote that the 2019 launch bolstered Amazon's voice-to-text ambitions and highlighted its more specialized medical vocabulary focus.

Sources

Source	What it adds
AWS announces Amazon Transcribe Medical	Official launch record: Dec. 2019 release date, HIPAA eligibility, real-time streaming, word timestamps, confidence scores, punctuation/capitalization, Comprehend Medical linkage
Introducing medical speech-to-text with Amazon Transcribe Medical	Launch rationale, workflow framing, customer quotes from Cerner, Amgen, and SoundLines/HealthChannels, plus Vasi Philomin role
Amazon Transcribe Medical now supports batch transcription	Confirms Apr. 2020 batch release and early batch capabilities including speaker/channel separation context
Amazon Transcribe Medical now supports custom vocabulary	Confirms Apr. 2020 vocabulary release, IPA pronunciation support, display forms, and batch plus streaming support
Announcing AWS PrivateLink support	Security/networking milestone for private access to streaming API
Streaming transcription support for new specialties	Public milestone for cardiology, oncology, neurology, radiology, and urology specialist support
Multi-channel support for streaming and batch	Confirms channel identification milestone for telehealth and pharmacovigilance scenarios
Automatic PHI identification	Adds PHI tagging and explicitly frames redaction workflows
Amazon Chime SDK live transcription support	Shows AWS ecosystem integration and lower-latency meeting use case
Amazon Transcribe Medical product page	Best current high-level feature and positioning summary, including today's specialty list and HealthScribe handoff

Citation list

AWS announces Amazon Transcribe Medical: https://aws.amazon.com/about-aws/whats-new/2019/12/aws-announces-amazon-transcribe-medical-medical-speech-recognition/
Amazon Transcribe Pricing: https://aws.amazon.com/transcribe/pricing/
Introducing medical speech-to-text with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/introducing-medical-speech-to-text-with-amazon-transcribe-medical/
Performing medical transcription analysis with Amazon Transcribe Medical and Amazon Comprehend Medical: https://aws.amazon.com/blogs/machine-learning/performing-medical-transcription-analysis-with-amazon-transcribe-medical-and-amazon-comprehend-medical/
Amazon Transcribe Medical developer guide: https://docs.aws.amazon.com/transcribe/latest/dg/transcribe-medical.html
Amazon Transcribe Medical product page: https://aws.amazon.com/transcribe/medical/
StartMedicalStreamTranscription API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html
StartMedicalTranscriptionJob API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html
How Amazon Transcribe Medical works: https://docs.aws.amazon.com/transcribe/latest/dg/how-it-works-med.html
Amazon Transcribe FAQs: https://aws.amazon.com/transcribe/faqs/
Amazon Transcribe Medical now supports custom vocabulary: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-custom-vocabulary/
Alternative medical transcriptions: https://docs.aws.amazon.com/transcribe/latest/dg/alternative-med-transcriptions.html
Conversation diarization (medical): https://docs.aws.amazon.com/transcribe/latest/dg/conversation-diarization-med.html
Multi-channel streaming and batch support: https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-transcribe-medical-now-supports-both-streaming-and-batch-transcription-of-multi-channel-audio/
Automatic PHI identification: https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-transcribe-medical-now-provides-automatic-protected-health-information-phi-identification/
AWS PrivateLink support for real-time streaming: https://aws.amazon.com/about-aws/whats-new/2020/06/announcing-aws-privatelink-support-for-amazon-transcribe-medical-real-time-streaming/
Amazon Chime SDK live transcription: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-chime-sdk-amazon-transcribe-amazon-transcribe-medical/
Amazon Transcribe API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/Welcome.html
Amazon Transcribe endpoints and quotas, AWS General Reference: https://docs.aws.amazon.com/general/latest/gr/transcribe.html
Teaching speech recognizers new words without retraining: https://www.amazon.science/blog/teaching-speech-recognizers-new-words-without-retraining
Medical custom vocabularies: https://docs.aws.amazon.com/transcribe/latest/dg/vocabulary-med.html
Robust acoustic and semantic contextual biasing in neural transducers for speech recognition: https://www.amazon.science/publications/robust-acoustic-and-semantic-contextual-biasing-in-neural-transducers-for-speech-recognition
Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition: https://www.amazon.science/publications/domain-adaptation-with-external-off-policy-acoustic-catalogs-for-scalable-contextual-end-to-end-automated-speech-recognition
Robust prediction of punctuation and truecasing for medical ASR: https://www.amazon.science/publications/robust-prediction-of-punctuation-and-truecasing-for-medical-asr
AG-LSEC: audio-grounded lexical speaker error correction: https://www.amazon.science/publications/ag-lsec-audio-grounded-lexical-speaker-error-correction
ILASR: privacy-preserving incremental learning for automatic speech recognition at production scale: https://www.amazon.science/publications/ilasr-privacy-preserving-incremental-learning-for-automatic-speech-recognition-at-production-scale
Enhancing speech-to-text accuracy of COVID-19-related terms with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/enhancing-speech-to-text-accuracy-of-covid-19-related-terms-with-amazon-transcribe-medical/
The range of AWS's speech research is on display at Interspeech: https://www.amazon.science/blog/the-range-of-awss-speech-research-is-on-display-at-interspeech
AWS HIPAA compliance: https://aws.amazon.com/compliance/hipaa-compliance/
Healthcare Dive coverage: https://www.healthcaredive.com/news/amazons-new-medical-transcription-service-bolsters-voice-to-text-bid/568245/
Google Cloud Speech-to-Text medical models: https://docs.cloud.google.com/speech-to-text/docs/v1/medical-models
Google Cloud Speech-to-Text pricing: https://cloud.google.com/speech-to-text/pricing
Dragon Medical One: https://www.microsoft.com/en-us/health-solutions/clinical-workflow/dragon-medical-one
Azure Speech to text: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
OpenAI Whisper: https://openai.com/index/whisper/
NVIDIA NeMo Parakeet ASR models: https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/
Google MedASR: https://developers.google.com/health-ai-developer-foundations/medasr
JAMIA Open study: https://academic.oup.com/jamiaopen/article/7/4/ooae130/7920671
Batch transcription announcement: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-batch-transcription-of-medical-audio-files/
Streaming specialty support announcement: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-transcribe-medical-streaming-transcription-support-medical-specialties/
Listen, Know and Spell: knowledge-infused subword modeling for improving ASR performance of OOV named entities: https://assets.amazon.science/0c/47/311aae264493b8beefd696f7a295/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-oov-named-entities.pdf
Context-aware Transformer transducer for speech recognition: https://www.amazon.science/publications/context-aware-transformer-transducer-for-speech-recognition
Contextual biasing for speech recognition (patent): https://patents.google.com/patent/WO2020226789A1/en
Infusing knowledge graphs into automatic speech recognition (patent): https://patents.google.com/patent/US12400659B1/en
Using recurrent neural network for partitioning of audio and speaker diarization (patent): https://patents.google.com/patent/US10902843B2/en