Amazon Transcribe Medical: what AWS actually ships, and what it won't tell you

Amazon Transcribe Medical is AWS's managed medical speech recognition service for turning clinician dictation and clinician-patient conversations into text. It launched in December 2019 as a HIPAA-eligible capability of Amazon Transcribe, with real-time streaming on day one. Batch transcription arrived in April 2020, custom medical vocabularies later that month, specialty expansion in late 2020, multi-channel support in December 2020, and automatic PHI identification in January 2021. As of June 2026, the publicly documented product is still a transcription-focused API rather than a full ambient documentation agent, and AWS increasingly points customers toward AWS HealthScribe as the higher-level note-generation successor for clinical documentation workflows.

That framing matters because it sets expectations correctly. This is a building block, not a scribe. Its real strengths are AWS-native integration, predictable API-driven deployment, streaming and batch modes, speaker and channel features, medical vocabulary support, HIPAA eligibility, and a public price that compares well. AWS's static pricing page examples imply a medical transcription rate of about $0.075 per minute with a 60-minute monthly free tier for the first 12 months. Google's official medical Speech-to-Text pricing is $0.078 per minute after its own first 60 free minutes each month. Nuance Dragon Medical One is a different animal entirely, a workflow product rather than a metered cloud API, and Microsoft publicly emphasizes Dragon and Dragon Copilot for healthcare more than a separate Azure medical ASR API.

The biggest caveat, and the theme that keeps recurring throughout this piece, is transparency. AWS publicly describes Transcribe Medical as "deep learning" and "state-of-the-art machine learning," but it does not publish standardized word-error-rate benchmarks, model version numbers, latency targets, or the internal architecture behind the managed service. The best technical view comes from adjacent Amazon Science papers, which show AWS speech teams working on end-to-end ASR with CTC, neural transducers, context-aware transformer transducers, contextual biasing, knowledge-graph support for rare entities, medical punctuation and truecasing, privacy-preserving continual learning, and speaker-error correction. Those papers are relevant, but AWS never states that any one of them maps one-to-one onto the production Transcribe Medical stack.

What the service is and what it does

Amazon Transcribe Medical is an AWS API service for US-English medical speech transcription. AWS documents real-time streaming and batch transcription, two main audio modes (DICTATION and CONVERSATION), primary care plus multiple specialty-care domains, timestamps, confidence scores, alternative transcriptions, speaker diarization, channel identification, medical custom vocabularies, and PHI tagging. AWS positions it for clinical documentation, pharmacovigilance call review, telehealth subtitling, and healthcare contact-center scenarios.

AWS's product page now explicitly says the service provides transcription expertise for primary care and specialty areas including cardiology, neurology, obstetrics-gynecology, pediatrics, oncology, radiology, and urology. The documentation page for "Medical specialties and terms" still describes PRIMARYCARE as covering family medicine, internal medicine, OB-GYN, and pediatrics. So AWS's public materials are feature-complete on specialty coverage but not fully synchronized in how much detail they expose on the API-side taxonomy.

On deployment, the service is available through AWS console workflows, API calls, the AWS CLI, and AWS SDKs. Public API references and FAQs show the medical APIs alongside the broader Transcribe service family, with Boto3 examples for custom vocabulary creation and REST-style operation references for jobs and streams.

AWS's current endpoint documentation lists Transcribe Medical endpoints in 12 commercial regions plus AWS GovCloud West: US East North Virginia and Ohio, US West Northern California and Oregon, Canada Central, Europe Ireland, London, and Frankfurt, and Asia Pacific Seoul, Singapore, Sydney, and Tokyo. Regional support matters for residency, latency, and procurement, but you still need to validate the compliance scope of the specific region and the adjacent services in your workflow.

Here is the full documented capability picture.

Capability	Publicly documented status	Notes
Real-time transcription	Supported	StartMedicalStreamTranscription starts a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out.
Batch transcription	Supported	StartMedicalTranscriptionJob handles uploaded medical dictation or conversation files.
Audio types	Supported	AWS requires a Type such as DICTATION or CONVERSATION.
Language support	Limited	AWS FAQ says Transcribe Medical currently supports US English only.
Specialty support	Supported	Product page lists primary care plus cardiology, neurology, OB-GYN, pediatrics, oncology, radiology, and urology.
Medical custom vocabularies	Supported	Users can upload table-format vocabularies with IPA pronunciations and display forms.
Alternative transcriptions	Supported	Batch jobs can return 2 to 10 alternatives.
Word timestamps and confidence	Supported	Documented at launch and in API output.
Speaker diarization	Supported	AWS labels speakers and supports streaming plus batch diarization.
Channel identification	Supported	Added for both streaming and batch multi-channel audio in December 2020.
PHI identification	Supported	Added in January 2021 at no extra charge.
Private connectivity	Supported	AWS PrivateLink support for real-time streaming was announced in June 2020.
Chime SDK integration	Supported	Live transcription can be integrated via Amazon Chime SDK, including specialty and conversation type selection.
Clinical note generation	Not native to Transcribe Medical	AWS now directs users needing a single note-generation API toward AWS HealthScribe.

Abstract illustration of layered geometric lattices with amber signal paths threading through them, suggesting a speech model's hidden internal architecture

What's under the hood, as far as anyone can tell

AWS publicly confirms that Transcribe Medical is a deep-learning-based ASR service optimized for medical speech, with automatic punctuation and capitalization, specialty-aware transcription, dictation versus conversational modes, custom medical vocabularies, speaker and channel logic, and optional PHI identification. It also documents a stateless service posture and an API-first delivery model. It does not publish the acoustic model family, decoder design, training corpus size, language-model design, or any release-by-release model identifier history.

The strongest public reading is that Transcribe Medical probably sits on the same broad AWS speech-research foundation used across Amazon speech products, in a medicalized and production-hardened form. Amazon Science publications on rare medical terms, domain adaptation, punctuation, personalization, and diarization show AWS researchers actively working on CTC-based architectures, neural transducers, context-aware transformer transducers, contextual adapters, knowledge-graph infusion, privacy-preserving continual learning, and post-ASR speaker-error correction. That does not prove those exact papers are the production implementation. It does show the technical repertoire available inside AWS's speech organization.

The table below lays out what AWS documents against what its research record suggests, area by area.

Technology area	What AWS documents for Transcribe Medical	What AWS public research suggests	Assessment
Core ASR model	"Deep learning" / "state-of-the-art machine learning" medical ASR.	Amazon speech teams publish on CTC, neural transducers, and context-aware transformer transducers for production ASR.	High confidence that the service uses modern end-to-end ASR; low confidence on the exact architecture because AWS does not disclose it.
Language modeling and rare-term handling	Supports specialty selection and medical custom vocabularies.	AWS papers describe contextual biasing, semantic/acoustic biasing, and knowledge-graph support for out-of-vocabulary entities, including medical terminology.	Strong evidence that rare-term biasing is a major design theme; exact LM design for Transcribe Medical is not public.
Domain adaptation	API requires Specialty and Type; AWS expanded specialty coverage over time.	AWS has published post-training domain adaptation methods using synthetic acoustic catalogs and KNN fusion.	Strong evidence of domain-conditioned decoding/modeling, though whether this appears as separate specialty models or lighter adaptation is undisclosed.
Noise and acoustic robustness	AWS FAQ says Transcribe is designed for variation in volume, pitch, and speaking rate, but noise, overlap, accents, and code-switching can degrade output.	No medical-specific public paper clearly documents the production front-end denoising stack.	Public documentation is enough to know limits, not enough to reverse-engineer the front end.
Punctuation and casing	Automatic punctuation and capitalization are part of the launch and product positioning.	AWS medical ASR paper uses BERT/BioBERT/RoBERTa for punctuation and truecasing, with domain adaptation and augmentation.	Very likely that punctuation/truecasing is a distinct downstream stage or integrated module.
Speaker diarization	Documented for streaming and batch; output includes speaker labels; overlapping speech is linearized by start time.	AWS research focuses on reducing speaker errors with audio-grounded lexical correction.	Public docs describe the interface; research suggests active work on improving turn-attribution around overlaps.
Privacy-preserving learning	AWS says medical customer content is not used to improve AWS AI technologies.	AWS also publishes privacy-preserving continual-learning work using ephemeral, weakly supervised data in production ASR.	Suggests AWS has internal methods for model refresh under privacy constraints, but not necessarily on medical customer data.

One practical distinction deserves emphasis: customization depth. Transcribe Medical supports medical custom vocabularies, but in the public sources reviewed, AWS does not document a customer-trainable custom medical language model analogous to standard Amazon Transcribe CLM. AWS's CLM FAQ is framed around standard Transcribe, while the medical docs emphasize vocabularies instead. That makes Transcribe Medical more customizable than a fixed black box, but less customizable than platforms that let customers train full medical acoustic or language models.

The version history that isn't one

AWS's public history for Transcribe Medical is feature-oriented rather than version-oriented. Customers can reconstruct major milestones from launch posts, docs, and "What's New" announcements, but AWS does not expose a numbered model lineage, model cards for Transcribe Medical itself, or a release log with benchmark deltas. The milestone record below is compiled from AWS launch posts and official "What's New" announcements.

Source	What it adds
AWS announces Amazon Transcribe Medical	Official launch record: Dec. 2019 release date, HIPAA eligibility, real-time streaming, word timestamps, confidence scores, punctuation/capitalization, Comprehend Medical linkage
Introducing medical speech-to-text with Amazon Transcribe Medical	Launch rationale, workflow framing, customer quotes from Cerner, Amgen, and SoundLines/HealthChannels, plus Vasi Philomin role
Amazon Transcribe Medical now supports batch transcription	Confirms Apr. 2020 batch release and early batch capabilities including speaker/channel separation context
Amazon Transcribe Medical now supports custom vocabulary	Confirms Apr. 2020 vocabulary release, IPA pronunciation support, display forms, and batch plus streaming support
Announcing AWS PrivateLink support	Security/networking milestone for private access to streaming API
Streaming transcription support for new specialties	Public milestone for cardiology, oncology, neurology, radiology, and urology specialist support
Multi-channel support for streaming and batch	Confirms channel identification milestone for telehealth and pharmacovigilance scenarios
Automatic PHI identification	Adds PHI tagging and explicitly frames redaction workflows
Amazon Chime SDK live transcription support	Shows AWS ecosystem integration and lower-latency meeting use case
Amazon Transcribe Medical product page	Best current high-level feature and positioning summary, including today's specialty list and HealthScribe handoff

Who builds this thing? Publicly identifiable leadership and contributors are easier to find through launch blogs and Amazon Science than through formal product org charts. AWS has not published an engineering roster for Transcribe Medical, but the following names and organizations are directly tied to the service or to adjacent AWS speech research.

Publicly identified person or org	Role in the public record	Relevance
Vasi Philomin	GM for Machine Learning and AI at AWS; launch blog author	Public launch sponsor/executive owner across AWS language services in 2019
Paul Zhao	Product Manager at AWS Machine Learning managing Amazon Transcribe	Direct product-facing owner named in Transcribe Medical blog materials
Katrin Kirchhoff	Senior Manager and Principal Scientist at AWS AI in 2020; later described as Director of Speech Processing for AWS; affiliated with AWS AI Labs in research literature	Key public research leader for AWS speech technologies relevant to Transcribe
Scott Seyfarth	Data Scientist at AWS AI working on improving Amazon Transcribe and Transcribe Medical	Directly tied to service improvement in public author bios
Ruoyu Huang	Software Development Engineer at Amazon Transcribe	Publicly named engineering contributor on Transcribe Medical customization work
AWS AI / AWS Machine Learning / Amazon Science speech teams	Product and research organizations behind AWS language and speech services	The most visible institutions behind the service
Cerner, Amgen, SoundLines/HealthChannels	Early public customers or quoted adopters	Evidence of early industry uptake in EHR, pharmacovigilance, and care-team workflows

The organizational takeaway: Transcribe Medical appears to sit at the intersection of productized AWS AI services and a broader Amazon Science speech-research program. That is good for technical depth. It also means the service inherits the opacity of many managed AI products, where the public record exposes capabilities and some authors, not the full production design.

Security, privacy, and the regulatory fine print

AWS describes Transcribe Medical as HIPAA-eligible, available under AWS's Business Associate Addendum, and subject to the AWS shared responsibility model. AWS states that BAA customers must encrypt PHI at rest and in transit, and that customers remain responsible for correct service configuration and lawful use. Standard stuff for cloud healthcare services, but still operationally significant: compliance depends on the whole workflow, not just the ASR endpoint.

There is one privacy distinction worth knowing before procurement conversations start. The medical FAQ is stricter than the general Transcribe FAQ. The general FAQ says content may be stored and used to provide, maintain, improve, and develop Amazon Transcribe and related AI technologies unless customers opt out. AWS says Amazon Transcribe Medical, by contrast, does not use content processed by the service for any purpose other than to provide and maintain the service, and does not use that content to improve Amazon Transcribe Medical or other Amazon AI technologies. The product page also describes the service as stateless: it stores neither inbound audio nor output text, and leaves storage choices to the customer.

Consideration	AWS public position	Practical implication
HIPAA eligibility	Yes.	Useful for PHI workflows, but only with a BAA and compliant architecture around the service.
BAA and encryption duties	AWS says BAA customers must encrypt PHI at rest and in transit.	Security controls remain partly customer-owned.
Data retention stance	Product page says stateless; FAQ says medical content is not used to improve AWS AI.	Stronger privacy posture than standard Transcribe, at least in public documentation.
PHI identification	Available at no additional charge.	Helps redaction workflows, but is not a substitute for full de-identification review.
PHI de-identification	AWS explicitly warns PHI identification may not accurately identify PHI in all circumstances and does not satisfy HIPAA de-identification requirements.	Human review or separate de-identification controls are still required.
Custom vocabulary content	AWS says do not include PII or PHI in medical custom vocabularies.	Customers need governance for vocabulary curation.
Private networking	PrivateLink for real-time streaming is available.	Reduces exposure to the public internet and fits stricter network topologies.
Region choice	Multiple commercial regions plus GovCloud West are documented.	Supports residency and procurement choices, but end-to-end residency depends on all connected services.

For regulated deployments, the most defensible pattern is to treat Transcribe Medical as one compliant component in a larger controlled system: private networking where possible, carefully scoped IAM, encrypted S3 output, limited retention, PHI tagging plus secondary review, and documented human validation for any workflow that can affect care or billing. AWS's own documentation repeatedly warns that Transcribe Medical is not a substitute for professional medical advice, diagnosis, or treatment, and that users should apply confidence thresholds and human review where accuracy needs are high.

Abstract illustration of a single audio waveform splitting into parallel channel paths guarded by geometric shield-like shapes, in muted sage and amber on slate-teal

Reception, evidence, and how the competition stacks up

AWS's own adoption evidence is strongest in healthcare IT and pharmacovigilance. At launch, Cerner said it was developing a digital voice scribe on top of Transcribe Medical, Amgen cited use in pharmacovigilance call review, and SoundLines/HealthChannels described using the API in care-team and analytics workflows. AWS blogs later showed integration patterns with Amazon Comprehend Medical, Twilio Media Streams, Veritas telehealth review workflows, and Amazon Chime SDK. These examples show credible adoption as a platform component, especially for builders already inside the AWS ecosystem.

Industry coverage treated the 2019 launch as a meaningful move by AWS into healthcare voice infrastructure. Healthcare Dive wrote that the service bolstered Amazon's voice-to-text ambitions and noted its more specialized medical vocabulary focus. Since then, the market's center of gravity has shifted from plain transcription APIs toward ambient clinical documentation, which is why AWS later introduced HealthScribe and Microsoft now emphasizes Dragon Copilot.

A purely quantitative accuracy-versus-latency chart would be misleading here, because AWS, Google, and Nuance do not publish directly comparable medical-ASR benchmark suites with normalized latency methodology. The more defensible comparison is capability- and workflow-based. The table below is an analytical inference from public delivery models and documented feature depth, not a vendor-provided benchmark.

Competitor	Delivery model	Medical specialization	Customization	Public pricing signal	Comparative read versus AWS
Amazon Transcribe Medical	Managed AWS API	Yes, medical-specific transcription	Medical custom vocabularies; specialty and type selection	AWS worked examples imply about $0.075/min with a 60-minute monthly free tier for first 12 months.	Strong developer fit, wide AWS integration, limited public transparency, transcription-first rather than workflow-first
Google Cloud Speech-to-Text medical models	Managed cloud API	Yes, separate medical dictation and medical conversation models	Alternate transcriptions, timestamps, confidence; conversation diarization; dictation spoken punctuation/formatting/headings	$0.078/min after first 60 free minutes per month.	Very similar API-layer competitor; slightly higher public list price; strong documentation for dictation formatting behaviors
Dragon Medical One	Clinician-facing documentation software	Yes, purpose-built clinical documentation product	Extensive end-user vocabulary, commands, templates, workflow features	Public price not clearly exposed in the reviewed official pages; licensing/sales-led procurement	Stronger ready-made clinical workflow and EHR ergonomics; weaker as a simple developer API building block
Azure Speech plus Microsoft healthcare stack	General cloud speech platform plus Nuance products	Public docs position healthcare as a use case, but Microsoft's healthcare-specific speech story is mostly Dragon/Dragon Copilot	Custom speech and general speech platform tooling	Official page clearly exposes free tier structure and per-second billing, but exact paid rates were not recoverable from the static pricing HTML reviewed here.	If you want Microsoft-native general speech plus customization, Azure fits; if you want healthcare-specialized voice, Microsoft steers customers to Dragon
Open-source Whisper	Self-hosted model/software	No, general-purpose	Full deployment control, but no managed medical workflow	Infra cost only	Excellent flexibility and broad robustness, but customer owns validation, security, compliance, and medical adaptation
Open-source Parakeet	Self-hosted/open-source model	No dedicated medical specialization in the reviewed source	Full deployment control; punctuation and timestamps	Infra cost only	Attractive for performance and openness, but requires significant speech MLOps
Open MedASR	Open medical model	Yes, medical dictation/transcription	Fine-tunable health-domain model	Infra cost only	Most directly analogous open alternative for medical dictation, but still not a managed HIPAA-ready service by itself

On the independent-evidence side, the public literature is mixed but useful. A 2024 JAMIA Open study reported that AWS Medical outperformed AWS General on medical proper nouns, while also finding disparities in performance across speech from Black and White patients and persistent difficulty with spontaneous conversational phenomena. A 2023 digital-scribe comparison observed that word-diarization error differed little across speakers in most models, but Amazon Medical Conversation ASR showed a larger clinician-side gap in that study's setup. These papers do not settle who is best in class, but they reinforce a practical reality: medical specialization helps, yet speaker population, recording setup, overlap, and domain mismatch still matter a great deal.

Abstract illustration of several distinct signal-flow paths of different lengths and thicknesses running side by side toward a horizon, evoking a comparison of competing transcription pipelines

Where it breaks, and when to pick something else

The most important hard limitation is language coverage: Transcribe Medical is currently documented only for en-US medical transcription. That is a major constraint relative to general cloud speech services and to some open-source alternatives, and it narrows adoption outside US-English clinical workflows unless customers build translation or multilingual pipelines around the service.

The next limitation is the one this article keeps circling back to: transparency. AWS does not publish a public Transcribe Medical model card, WER benchmark suite, specialty-by-specialty scorecard, or latency SLO. That makes vendor comparison harder and shifts more burden onto customer-side validation. In practice, a regulated buyer should assume that acceptance testing on its own recordings is mandatory.

Limitation or failure mode	Why it matters	Mitigation
US-English only	Limits international or multilingual clinical use	Use separate multilingual ASR/translation stacks, or evaluate open/self-hosted alternatives for non-US-English workflows
Noise, overlap, accents, and code-switching reduce accuracy	Can materially affect real-world visit transcription quality	Use higher-quality microphones, channel-separated capture where possible, Chime SDK active-talker splitting, and human review
PHI identification is not HIPAA de-identification	Redaction workflows can fail if treated as automatic de-identification	Use PHI tagging as a first pass only; add review or dedicated de-identification controls
Speaker diarization linearizes overlap and may delay stable speaker labels in streaming	Speaker attribution can be wrong or late around interruptions	Prefer multi-channel audio when feasible; review speaker assignments in post-processing
Medical custom vocabulary cannot contain PHI/PII and large vocabularies are discouraged	Governance and vocabulary design affect accuracy and privacy	Build small, encounter-specific or specialty-specific vocabularies with strict curation
No public custom medical language model training path in reviewed docs	Lower ceiling for customer-specific language adaptation than some alternatives	Combine custom vocabulary with specialty routing, downstream correction, or consider open/self-trained models
No turnkey note generation in base product	Additional engineering needed for ambient documentation	Use HealthScribe or a Bedrock-based note layer if the requirement is note generation rather than transcript only

Weighing it up: Transcribe Medical is simpler than building your own medical ASR stack, more healthcare-ready than standard cloud speech, less workflow-heavy than Dragon, and tightly integrated into AWS services that healthcare builders already use, things like Comprehend Medical, HealthLake, Chime SDK, S3, Athena, and Bedrock. It is also relatively cost-efficient on public list pricing and appears to have a stricter privacy posture than standard Transcribe on the model-improvement question.

The disadvantages are just as concrete: limited language coverage, shallow public transparency, less clinician-facing workflow depth than Dragon, and less ultimate customization than self-hosted or open approaches. The service is also increasingly flanked by AWS's own higher-level offerings. If a team wants a transcript API, Transcribe Medical remains directly relevant. If that same team wants structured notes, role identification, dialogue classification, and summary traceability in one managed call, AWS itself now points them toward HealthScribe.

The selection rule I'd give a practitioner: choose Transcribe Medical when you want a medical ASR primitive inside an AWS-centric application. Choose HealthScribe when you want AWS to own more of the clinical-documentation stack. Choose Dragon Medical One when the buyer wants a clinician-facing documentation product, not an API. Choose Google Cloud medical models when you want a close API analogue on Google Cloud. Go open or self-hosted only if deployment control, sovereignty, or research customization outweigh the operational load of building and validating the stack yourself.

The research and patent trail

These papers and patents are adjacent technical evidence, not official reverse-engineering of the production service. They are most useful for understanding the kinds of methods AWS speech teams publicly work on.

Type	Source	Short summary
Paper	Robust prediction of punctuation and truecasing for medical ASR	AWS medical-ASR paper using pretrained masked language models and medical-domain adaptation for punctuation/truecasing; especially relevant to dictation usability
Paper	Listen, Know and Spell	Shows AWS AI interest in knowledge-graph infusion for OOV named entities in domains such as medical ASR
Blog plus paper pointer	Teaching speech recognizers new words without retraining	Explains contextual adapters and decoder biasing for difficult named entities; cites strong gains on medical terminology
Paper	Domain adaptation with external off-policy acoustic catalogs	Describes scalable post-training ASR adaptation using synthetic acoustic catalogs and KNN fusion; relevant to rare-domain adaptation
Paper	ILASR	Privacy-preserving incremental-learning framework for production ASR, relevant to how AWS could update speech models without relying on sensitive customer data
Paper	AG-LSEC	Improves speaker diarization by grounding lexical speaker correction in acoustics; relevant to medical conversation turn attribution
Paper	Context-aware Transformer transducer	Strong evidence that Amazon speech teams use advanced transducer architectures for rare-word/context-sensitive ASR
Patent	Contextual biasing for speech recognition	Amazon patent family on bias encoders and bias attention for rare/contextual phrases; highly relevant to specialized terminology support
Patent	Infusing knowledge graphs into automatic speech recognition	Patent on injecting domain knowledge such as medications, diseases, and drugs into ASR
Patent	Using recurrent neural network for partitioning of audio and speaker diarization	Amazon patent-family evidence around diarization plus ASR concurrency and segmentation

What we still don't know

The reviewed public material leaves several questions unresolved. AWS does not disclose Transcribe Medical's exact model family, medical training data sources or size, specialty-by-specialty benchmark scores, latency service objectives, or internal model version history. The public record also does not expose a complete service-team roster beyond blog authors and research contributors. Those gaps do not make the service unusable, but they do mean serious buyers should evaluate it as a managed black box with strong documentation and meaningful adjacent research, rather than as a fully transparent model platform.

Sources

AWS announces Amazon Transcribe Medical: https://aws.amazon.com/about-aws/whats-new/2019/12/aws-announces-amazon-transcribe-medical-medical-speech-recognition/
Amazon Transcribe Pricing: https://aws.amazon.com/transcribe/pricing/
Introducing medical speech-to-text with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/introducing-medical-speech-to-text-with-amazon-transcribe-medical/
Performing medical transcription analysis with Amazon Transcribe Medical and Amazon Comprehend Medical: https://aws.amazon.com/blogs/machine-learning/performing-medical-transcription-analysis-with-amazon-transcribe-medical-and-amazon-comprehend-medical/
Amazon Transcribe Medical developer guide: https://docs.aws.amazon.com/transcribe/latest/dg/transcribe-medical.html
Amazon Transcribe Medical product page: https://aws.amazon.com/transcribe/medical/
StartMedicalStreamTranscription API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html
StartMedicalTranscriptionJob API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html
How Amazon Transcribe Medical works: https://docs.aws.amazon.com/transcribe/latest/dg/how-it-works-med.html
Amazon Transcribe FAQs: https://aws.amazon.com/transcribe/faqs/
Amazon Transcribe Medical now supports custom vocabulary: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-custom-vocabulary/
Alternative medical transcriptions: https://docs.aws.amazon.com/transcribe/latest/dg/alternative-med-transcriptions.html
Conversation diarization (medical): https://docs.aws.amazon.com/transcribe/latest/dg/conversation-diarization-med.html
Multi-channel streaming and batch support: https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-transcribe-medical-now-supports-both-streaming-and-batch-transcription-of-multi-channel-audio/
Automatic PHI identification: https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-transcribe-medical-now-provides-automatic-protected-health-information-phi-identification/
AWS PrivateLink support for real-time streaming: https://aws.amazon.com/about-aws/whats-new/2020/06/announcing-aws-privatelink-support-for-amazon-transcribe-medical-real-time-streaming/
Amazon Chime SDK live transcription: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-chime-sdk-amazon-transcribe-amazon-transcribe-medical/
Amazon Transcribe API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/Welcome.html
Amazon Transcribe endpoints and quotas, AWS General Reference: https://docs.aws.amazon.com/general/latest/gr/transcribe.html
Teaching speech recognizers new words without retraining: https://www.amazon.science/blog/teaching-speech-recognizers-new-words-without-retraining
Medical custom vocabularies: https://docs.aws.amazon.com/transcribe/latest/dg/vocabulary-med.html
Robust acoustic and semantic contextual biasing in neural transducers for speech recognition: https://www.amazon.science/publications/robust-acoustic-and-semantic-contextual-biasing-in-neural-transducers-for-speech-recognition
Domain adaptation with external off-policy acoustic catalogs: https://www.amazon.science/publications/domain-adaptation-with-external-off-policy-acoustic-catalogs-for-scalable-contextual-end-to-end-automated-speech-recognition
Robust prediction of punctuation and truecasing for medical ASR: https://www.amazon.science/publications/robust-prediction-of-punctuation-and-truecasing-for-medical-asr
AG-LSEC: audio-grounded lexical speaker error correction: https://www.amazon.science/publications/ag-lsec-audio-grounded-lexical-speaker-error-correction
ILASR: privacy-preserving incremental learning for ASR at production scale: https://www.amazon.science/publications/ilasr-privacy-preserving-incremental-learning-for-automatic-speech-recognition-at-production-scale
Enhancing speech-to-text accuracy of COVID-19 related terms with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/enhancing-speech-to-text-accuracy-of-covid-19-related-terms-with-amazon-transcribe-medical/
The range of AWS's speech research on display at Interspeech: https://www.amazon.science/blog/the-range-of-awss-speech-research-is-on-display-at-interspeech
AWS HIPAA compliance: https://aws.amazon.com/compliance/hipaa-compliance/
Healthcare Dive on the Transcribe Medical launch: https://www.healthcaredive.com/news/amazons-new-medical-transcription-service-bolsters-voice-to-text-bid/568245/
Google Cloud Speech-to-Text medical models: https://docs.cloud.google.com/speech-to-text/docs/v1/medical-models
Google Cloud Speech-to-Text pricing: https://cloud.google.com/speech-to-text/pricing
Dragon Medical One: https://www.microsoft.com/en-us/health-solutions/clinical-workflow/dragon-medical-one
Azure Speech to text: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
OpenAI Whisper: https://openai.com/index/whisper/
NVIDIA NeMo Parakeet ASR models: https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/
Google MedASR (Health AI Developer Foundations): https://developers.google.com/health-ai-developer-foundations/medasr
JAMIA Open study on medical ASR performance: https://academic.oup.com/jamiaopen/article/7/4/ooae130/7920671
Amazon Transcribe Medical now supports batch transcription: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-batch-transcription-of-medical-audio-files/
Streaming transcription support for new specialties: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-transcribe-medical-streaming-transcription-support-medical-specialties/
Listen, Know and Spell: knowledge-infused subword modeling for OOV named entities: https://assets.amazon.science/0c/47/311aae264493b8beefd696f7a295/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-oov-named-entities.pdf
Context-aware Transformer transducer for speech recognition: https://www.amazon.science/publications/context-aware-transformer-transducer-for-speech-recognition
Patent WO2020226789A1, contextual biasing for speech recognition: https://patents.google.com/patent/WO2020226789A1/en
Patent US12400659B1, infusing knowledge graphs into ASR: https://patents.google.com/patent/US12400659B1/en
Patent US10902843B2, RNN-based audio partitioning and speaker diarization: https://patents.google.com/patent/US10902843B2/en