Amazon Transcribe Medical: what AWS actually ships, and what it won't tell you
What Amazon Transcribe Medical offers in 2026: features, pricing vs Google and Nuance, HIPAA posture, research clues, and where the service falls short.

Amazon Transcribe Medical is AWS's managed medical speech recognition service for turning clinician dictation and clinician-patient conversations into text. It launched in December 2019 as a HIPAA-eligible capability of Amazon Transcribe, with real-time streaming on day one. Batch transcription arrived in April 2020, custom medical vocabularies later that month, specialty expansion in late 2020, multi-channel support in December 2020, and automatic PHI identification in January 2021. As of June 2026, the publicly documented product is still a transcription-focused API rather than a full ambient documentation agent, and AWS increasingly points customers toward AWS HealthScribe as the higher-level note-generation successor for clinical documentation workflows.
That framing matters because it sets expectations correctly. This is a building block, not a scribe. Its real strengths are AWS-native integration, predictable API-driven deployment, streaming and batch modes, speaker and channel features, medical vocabulary support, HIPAA eligibility, and a public price that compares well. AWS's static pricing page examples imply a medical transcription rate of about $0.075 per minute with a 60-minute monthly free tier for the first 12 months. Google's official medical Speech-to-Text pricing is $0.078 per minute after its own first 60 free minutes each month. Nuance Dragon Medical One is a different animal entirely, a workflow product rather than a metered cloud API, and Microsoft publicly emphasizes Dragon and Dragon Copilot for healthcare more than a separate Azure medical ASR API.
The biggest caveat, and the theme that keeps recurring throughout this piece, is transparency. AWS publicly describes Transcribe Medical as "deep learning" and "state-of-the-art machine learning," but it does not publish standardized word-error-rate benchmarks, model version numbers, latency targets, or the internal architecture behind the managed service. The best technical view comes from adjacent Amazon Science papers, which show AWS speech teams working on end-to-end ASR with CTC, neural transducers, context-aware transformer transducers, contextual biasing, knowledge-graph support for rare entities, medical punctuation and truecasing, privacy-preserving continual learning, and speaker-error correction. Those papers are relevant, but AWS never states that any one of them maps one-to-one onto the production Transcribe Medical stack.
What the service is and what it does
Amazon Transcribe Medical is an AWS API service for US-English medical speech transcription. AWS documents real-time streaming and batch transcription, two main audio modes (DICTATION and CONVERSATION), primary care plus multiple specialty-care domains, timestamps, confidence scores, alternative transcriptions, speaker diarization, channel identification, medical custom vocabularies, and PHI tagging. AWS positions it for clinical documentation, pharmacovigilance call review, telehealth subtitling, and healthcare contact-center scenarios.
AWS's product page now explicitly says the service provides transcription expertise for primary care and specialty areas including cardiology, neurology, obstetrics-gynecology, pediatrics, oncology, radiology, and urology. The documentation page for "Medical specialties and terms" still describes PRIMARYCARE as covering family medicine, internal medicine, OB-GYN, and pediatrics. So AWS's public materials are feature-complete on specialty coverage but not fully synchronized in how much detail they expose on the API-side taxonomy.
On deployment, the service is available through AWS console workflows, API calls, the AWS CLI, and AWS SDKs. Public API references and FAQs show the medical APIs alongside the broader Transcribe service family, with Boto3 examples for custom vocabulary creation and REST-style operation references for jobs and streams.
AWS's current endpoint documentation lists Transcribe Medical endpoints in 12 commercial regions plus AWS GovCloud West: US East North Virginia and Ohio, US West Northern California and Oregon, Canada Central, Europe Ireland, London, and Frankfurt, and Asia Pacific Seoul, Singapore, Sydney, and Tokyo. Regional support matters for residency, latency, and procurement, but you still need to validate the compliance scope of the specific region and the adjacent services in your workflow.
Here is the full documented capability picture.
| Capability | Publicly documented status | Notes |
|---|---|---|
| Real-time transcription | Supported | StartMedicalStreamTranscription starts a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out. |
| Batch transcription | Supported | StartMedicalTranscriptionJob handles uploaded medical dictation or conversation files. |
| Audio types | Supported | AWS requires a Type such as DICTATION or CONVERSATION. |
| Language support | Limited | AWS FAQ says Transcribe Medical currently supports US English only. |
| Specialty support | Supported | Product page lists primary care plus cardiology, neurology, OB-GYN, pediatrics, oncology, radiology, and urology. |
| Medical custom vocabularies | Supported | Users can upload table-format vocabularies with IPA pronunciations and display forms. |
| Alternative transcriptions | Supported | Batch jobs can return 2 to 10 alternatives. |
| Word timestamps and confidence | Supported | Documented at launch and in API output. |
| Speaker diarization | Supported | AWS labels speakers and supports streaming plus batch diarization. |
| Channel identification | Supported | Added for both streaming and batch multi-channel audio in December 2020. |
| PHI identification | Supported | Added in January 2021 at no extra charge. |
| Private connectivity | Supported | AWS PrivateLink support for real-time streaming was announced in June 2020. |
| Chime SDK integration | Supported | Live transcription can be integrated via Amazon Chime SDK, including specialty and conversation type selection. |
| Clinical note generation | Not native to Transcribe Medical | AWS now directs users needing a single note-generation API toward AWS HealthScribe. |

What's under the hood, as far as anyone can tell
AWS publicly confirms that Transcribe Medical is a deep-learning-based ASR service optimized for medical speech, with automatic punctuation and capitalization, specialty-aware transcription, dictation versus conversational modes, custom medical vocabularies, speaker and channel logic, and optional PHI identification. It also documents a stateless service posture and an API-first delivery model. It does not publish the acoustic model family, decoder design, training corpus size, language-model design, or any release-by-release model identifier history.
The strongest public reading is that Transcribe Medical probably sits on the same broad AWS speech-research foundation used across Amazon speech products, in a medicalized and production-hardened form. Amazon Science publications on rare medical terms, domain adaptation, punctuation, personalization, and diarization show AWS researchers actively working on CTC-based architectures, neural transducers, context-aware transformer transducers, contextual adapters, knowledge-graph infusion, privacy-preserving continual learning, and post-ASR speaker-error correction. That does not prove those exact papers are the production implementation. It does show the technical repertoire available inside AWS's speech organization.
The table below lays out what AWS documents against what its research record suggests, area by area.
| Technology area | What AWS documents for Transcribe Medical | What AWS public research suggests | Assessment |
|---|---|---|---|
| Core ASR model | "Deep learning" / "state-of-the-art machine learning" medical ASR. | Amazon speech teams publish on CTC, neural transducers, and context-aware transformer transducers for production ASR. | High confidence that the service uses modern end-to-end ASR; low confidence on the exact architecture because AWS does not disclose it. |
| Language modeling and rare-term handling | Supports specialty selection and medical custom vocabularies. | AWS papers describe contextual biasing, semantic/acoustic biasing, and knowledge-graph support for out-of-vocabulary entities, including medical terminology. | Strong evidence that rare-term biasing is a major design theme; exact LM design for Transcribe Medical is not public. |
| Domain adaptation | API requires Specialty and Type; AWS expanded specialty coverage over time. | AWS has published post-training domain adaptation methods using synthetic acoustic catalogs and KNN fusion. | Strong evidence of domain-conditioned decoding/modeling, though whether this appears as separate specialty models or lighter adaptation is undisclosed. |
| Noise and acoustic robustness | AWS FAQ says Transcribe is designed for variation in volume, pitch, and speaking rate, but noise, overlap, accents, and code-switching can degrade output. | No medical-specific public paper clearly documents the production front-end denoising stack. | Public documentation is enough to know limits, not enough to reverse-engineer the front end. |
| Punctuation and casing | Automatic punctuation and capitalization are part of the launch and product positioning. | AWS medical ASR paper uses BERT/BioBERT/RoBERTa for punctuation and truecasing, with domain adaptation and augmentation. | Very likely that punctuation/truecasing is a distinct downstream stage or integrated module. |
| Speaker diarization | Documented for streaming and batch; output includes speaker labels; overlapping speech is linearized by start time. | AWS research focuses on reducing speaker errors with audio-grounded lexical correction. | Public docs describe the interface; research suggests active work on improving turn-attribution around overlaps. |
| Privacy-preserving learning | AWS says medical customer content is not used to improve AWS AI technologies. | AWS also publishes privacy-preserving continual-learning work using ephemeral, weakly supervised data in production ASR. | Suggests AWS has internal methods for model refresh under privacy constraints, but not necessarily on medical customer data. |
One practical distinction deserves emphasis: customization depth. Transcribe Medical supports medical custom vocabularies, but in the public sources reviewed, AWS does not document a customer-trainable custom medical language model analogous to standard Amazon Transcribe CLM. AWS's CLM FAQ is framed around standard Transcribe, while the medical docs emphasize vocabularies instead. That makes Transcribe Medical more customizable than a fixed black box, but less customizable than platforms that let customers train full medical acoustic or language models.
The version history that isn't one
AWS's public history for Transcribe Medical is feature-oriented rather than version-oriented. Customers can reconstruct major milestones from launch posts, docs, and "What's New" announcements, but AWS does not expose a numbered model lineage, model cards for Transcribe Medical itself, or a release log with benchmark deltas. The milestone record below is compiled from AWS launch posts and official "What's New" announcements.
| Source | What it adds |
|---|---|
| AWS announces Amazon Transcribe Medical | Official launch record: Dec. 2019 release date, HIPAA eligibility, real-time streaming, word timestamps, confidence scores, punctuation/capitalization, Comprehend Medical linkage |
| Introducing medical speech-to-text with Amazon Transcribe Medical | Launch rationale, workflow framing, customer quotes from Cerner, Amgen, and SoundLines/HealthChannels, plus Vasi Philomin role |
| Amazon Transcribe Medical now supports batch transcription | Confirms Apr. 2020 batch release and early batch capabilities including speaker/channel separation context |
| Amazon Transcribe Medical now supports custom vocabulary | Confirms Apr. 2020 vocabulary release, IPA pronunciation support, display forms, and batch plus streaming support |
| Announcing AWS PrivateLink support | Security/networking milestone for private access to streaming API |
| Streaming transcription support for new specialties | Public milestone for cardiology, oncology, neurology, radiology, and urology specialist support |
| Multi-channel support for streaming and batch | Confirms channel identification milestone for telehealth and pharmacovigilance scenarios |
| Automatic PHI identification | Adds PHI tagging and explicitly frames redaction workflows |
| Amazon Chime SDK live transcription support | Shows AWS ecosystem integration and lower-latency meeting use case |
| Amazon Transcribe Medical product page | Best current high-level feature and positioning summary, including today's specialty list and HealthScribe handoff |
Who builds this thing? Publicly identifiable leadership and contributors are easier to find through launch blogs and Amazon Science than through formal product org charts. AWS has not published an engineering roster for Transcribe Medical, but the following names and organizations are directly tied to the service or to adjacent AWS speech research.
| Publicly identified person or org | Role in the public record | Relevance |
|---|---|---|
| Vasi Philomin | GM for Machine Learning and AI at AWS; launch blog author | Public launch sponsor/executive owner across AWS language services in 2019 |
| Paul Zhao | Product Manager at AWS Machine Learning managing Amazon Transcribe | Direct product-facing owner named in Transcribe Medical blog materials |
| Katrin Kirchhoff | Senior Manager and Principal Scientist at AWS AI in 2020; later described as Director of Speech Processing for AWS; affiliated with AWS AI Labs in research literature | Key public research leader for AWS speech technologies relevant to Transcribe |
| Scott Seyfarth | Data Scientist at AWS AI working on improving Amazon Transcribe and Transcribe Medical | Directly tied to service improvement in public author bios |
| Ruoyu Huang | Software Development Engineer at Amazon Transcribe | Publicly named engineering contributor on Transcribe Medical customization work |
| AWS AI / AWS Machine Learning / Amazon Science speech teams | Product and research organizations behind AWS language and speech services | The most visible institutions behind the service |
| Cerner, Amgen, SoundLines/HealthChannels | Early public customers or quoted adopters | Evidence of early industry uptake in EHR, pharmacovigilance, and care-team workflows |
The organizational takeaway: Transcribe Medical appears to sit at the intersection of productized AWS AI services and a broader Amazon Science speech-research program. That is good for technical depth. It also means the service inherits the opacity of many managed AI products, where the public record exposes capabilities and some authors, not the full production design.
Security, privacy, and the regulatory fine print
AWS describes Transcribe Medical as HIPAA-eligible, available under AWS's Business Associate Addendum, and subject to the AWS shared responsibility model. AWS states that BAA customers must encrypt PHI at rest and in transit, and that customers remain responsible for correct service configuration and lawful use. Standard stuff for cloud healthcare services, but still operationally significant: compliance depends on the whole workflow, not just the ASR endpoint.
There is one privacy distinction worth knowing before procurement conversations start. The medical FAQ is stricter than the general Transcribe FAQ. The general FAQ says content may be stored and used to provide, maintain, improve, and develop Amazon Transcribe and related AI technologies unless customers opt out. AWS says Amazon Transcribe Medical, by contrast, does not use content processed by the service for any purpose other than to provide and maintain the service, and does not use that content to improve Amazon Transcribe Medical or other Amazon AI technologies. The product page also describes the service as stateless: it stores neither inbound audio nor output text, and leaves storage choices to the customer.
| Consideration | AWS public position | Practical implication |
|---|---|---|
| HIPAA eligibility | Yes. | Useful for PHI workflows, but only with a BAA and compliant architecture around the service. |
| BAA and encryption duties | AWS says BAA customers must encrypt PHI at rest and in transit. | Security controls remain partly customer-owned. |
| Data retention stance | Product page says stateless; FAQ says medical content is not used to improve AWS AI. | Stronger privacy posture than standard Transcribe, at least in public documentation. |
| PHI identification | Available at no additional charge. | Helps redaction workflows, but is not a substitute for full de-identification review. |
| PHI de-identification | AWS explicitly warns PHI identification may not accurately identify PHI in all circumstances and does not satisfy HIPAA de-identification requirements. | Human review or separate de-identification controls are still required. |
| Custom vocabulary content | AWS says do not include PII or PHI in medical custom vocabularies. | Customers need governance for vocabulary curation. |
| Private networking | PrivateLink for real-time streaming is available. | Reduces exposure to the public internet and fits stricter network topologies. |
| Region choice | Multiple commercial regions plus GovCloud West are documented. | Supports residency and procurement choices, but end-to-end residency depends on all connected services. |
For regulated deployments, the most defensible pattern is to treat Transcribe Medical as one compliant component in a larger controlled system: private networking where possible, carefully scoped IAM, encrypted S3 output, limited retention, PHI tagging plus secondary review, and documented human validation for any workflow that can affect care or billing. AWS's own documentation repeatedly warns that Transcribe Medical is not a substitute for professional medical advice, diagnosis, or treatment, and that users should apply confidence thresholds and human review where accuracy needs are high.

Reception, evidence, and how the competition stacks up
AWS's own adoption evidence is strongest in healthcare IT and pharmacovigilance. At launch, Cerner said it was developing a digital voice scribe on top of Transcribe Medical, Amgen cited use in pharmacovigilance call review, and SoundLines/HealthChannels described using the API in care-team and analytics workflows. AWS blogs later showed integration patterns with Amazon Comprehend Medical, Twilio Media Streams, Veritas telehealth review workflows, and Amazon Chime SDK. These examples show credible adoption as a platform component, especially for builders already inside the AWS ecosystem.
Industry coverage treated the 2019 launch as a meaningful move by AWS into healthcare voice infrastructure. Healthcare Dive wrote that the service bolstered Amazon's voice-to-text ambitions and noted its more specialized medical vocabulary focus. Since then, the market's center of gravity has shifted from plain transcription APIs toward ambient clinical documentation, which is why AWS later introduced HealthScribe and Microsoft now emphasizes Dragon Copilot.
A purely quantitative accuracy-versus-latency chart would be misleading here, because AWS, Google, and Nuance do not publish directly comparable medical-ASR benchmark suites with normalized latency methodology. The more defensible comparison is capability- and workflow-based. The table below is an analytical inference from public delivery models and documented feature depth, not a vendor-provided benchmark.
| Competitor | Delivery model | Medical specialization | Customization | Public pricing signal | Comparative read versus AWS |
|---|---|---|---|---|---|
| Amazon Transcribe Medical | Managed AWS API | Yes, medical-specific transcription | Medical custom vocabularies; specialty and type selection | AWS worked examples imply about $0.075/min with a 60-minute monthly free tier for first 12 months. | Strong developer fit, wide AWS integration, limited public transparency, transcription-first rather than workflow-first |
| Google Cloud Speech-to-Text medical models | Managed cloud API | Yes, separate medical dictation and medical conversation models | Alternate transcriptions, timestamps, confidence; conversation diarization; dictation spoken punctuation/formatting/headings | $0.078/min after first 60 free minutes per month. | Very similar API-layer competitor; slightly higher public list price; strong documentation for dictation formatting behaviors |
| Dragon Medical One | Clinician-facing documentation software | Yes, purpose-built clinical documentation product | Extensive end-user vocabulary, commands, templates, workflow features | Public price not clearly exposed in the reviewed official pages; licensing/sales-led procurement | Stronger ready-made clinical workflow and EHR ergonomics; weaker as a simple developer API building block |
| Azure Speech plus Microsoft healthcare stack | General cloud speech platform plus Nuance products | Public docs position healthcare as a use case, but Microsoft's healthcare-specific speech story is mostly Dragon/Dragon Copilot | Custom speech and general speech platform tooling | Official page clearly exposes free tier structure and per-second billing, but exact paid rates were not recoverable from the static pricing HTML reviewed here. | If you want Microsoft-native general speech plus customization, Azure fits; if you want healthcare-specialized voice, Microsoft steers customers to Dragon |
| Open-source Whisper | Self-hosted model/software | No, general-purpose | Full deployment control, but no managed medical workflow | Infra cost only | Excellent flexibility and broad robustness, but customer owns validation, security, compliance, and medical adaptation |
| Open-source Parakeet | Self-hosted/open-source model | No dedicated medical specialization in the reviewed source | Full deployment control; punctuation and timestamps | Infra cost only | Attractive for performance and openness, but requires significant speech MLOps |
| Open MedASR | Open medical model | Yes, medical dictation/transcription | Fine-tunable health-domain model | Infra cost only | Most directly analogous open alternative for medical dictation, but still not a managed HIPAA-ready service by itself |
On the independent-evidence side, the public literature is mixed but useful. A 2024 JAMIA Open study reported that AWS Medical outperformed AWS General on medical proper nouns, while also finding disparities in performance across speech from Black and White patients and persistent difficulty with spontaneous conversational phenomena. A 2023 digital-scribe comparison observed that word-diarization error differed little across speakers in most models, but Amazon Medical Conversation ASR showed a larger clinician-side gap in that study's setup. These papers do not settle who is best in class, but they reinforce a practical reality: medical specialization helps, yet speaker population, recording setup, overlap, and domain mismatch still matter a great deal.

Where it breaks, and when to pick something else
The most important hard limitation is language coverage: Transcribe Medical is currently documented only for en-US medical transcription. That is a major constraint relative to general cloud speech services and to some open-source alternatives, and it narrows adoption outside US-English clinical workflows unless customers build translation or multilingual pipelines around the service.
The next limitation is the one this article keeps circling back to: transparency. AWS does not publish a public Transcribe Medical model card, WER benchmark suite, specialty-by-specialty scorecard, or latency SLO. That makes vendor comparison harder and shifts more burden onto customer-side validation. In practice, a regulated buyer should assume that acceptance testing on its own recordings is mandatory.
| Limitation or failure mode | Why it matters | Mitigation |
|---|---|---|
| US-English only | Limits international or multilingual clinical use | Use separate multilingual ASR/translation stacks, or evaluate open/self-hosted alternatives for non-US-English workflows |
| Noise, overlap, accents, and code-switching reduce accuracy | Can materially affect real-world visit transcription quality | Use higher-quality microphones, channel-separated capture where possible, Chime SDK active-talker splitting, and human review |
| PHI identification is not HIPAA de-identification | Redaction workflows can fail if treated as automatic de-identification | Use PHI tagging as a first pass only; add review or dedicated de-identification controls |
| Speaker diarization linearizes overlap and may delay stable speaker labels in streaming | Speaker attribution can be wrong or late around interruptions | Prefer multi-channel audio when feasible; review speaker assignments in post-processing |
| Medical custom vocabulary cannot contain PHI/PII and large vocabularies are discouraged | Governance and vocabulary design affect accuracy and privacy | Build small, encounter-specific or specialty-specific vocabularies with strict curation |
| No public custom medical language model training path in reviewed docs | Lower ceiling for customer-specific language adaptation than some alternatives | Combine custom vocabulary with specialty routing, downstream correction, or consider open/self-trained models |
| No turnkey note generation in base product | Additional engineering needed for ambient documentation | Use HealthScribe or a Bedrock-based note layer if the requirement is note generation rather than transcript only |
Weighing it up: Transcribe Medical is simpler than building your own medical ASR stack, more healthcare-ready than standard cloud speech, less workflow-heavy than Dragon, and tightly integrated into AWS services that healthcare builders already use, things like Comprehend Medical, HealthLake, Chime SDK, S3, Athena, and Bedrock. It is also relatively cost-efficient on public list pricing and appears to have a stricter privacy posture than standard Transcribe on the model-improvement question.
The disadvantages are just as concrete: limited language coverage, shallow public transparency, less clinician-facing workflow depth than Dragon, and less ultimate customization than self-hosted or open approaches. The service is also increasingly flanked by AWS's own higher-level offerings. If a team wants a transcript API, Transcribe Medical remains directly relevant. If that same team wants structured notes, role identification, dialogue classification, and summary traceability in one managed call, AWS itself now points them toward HealthScribe.
The selection rule I'd give a practitioner: choose Transcribe Medical when you want a medical ASR primitive inside an AWS-centric application. Choose HealthScribe when you want AWS to own more of the clinical-documentation stack. Choose Dragon Medical One when the buyer wants a clinician-facing documentation product, not an API. Choose Google Cloud medical models when you want a close API analogue on Google Cloud. Go open or self-hosted only if deployment control, sovereignty, or research customization outweigh the operational load of building and validating the stack yourself.
The research and patent trail
These papers and patents are adjacent technical evidence, not official reverse-engineering of the production service. They are most useful for understanding the kinds of methods AWS speech teams publicly work on.
| Type | Source | Short summary |
|---|---|---|
| Paper | Robust prediction of punctuation and truecasing for medical ASR | AWS medical-ASR paper using pretrained masked language models and medical-domain adaptation for punctuation/truecasing; especially relevant to dictation usability |
| Paper | Listen, Know and Spell | Shows AWS AI interest in knowledge-graph infusion for OOV named entities in domains such as medical ASR |
| Blog plus paper pointer | Teaching speech recognizers new words without retraining | Explains contextual adapters and decoder biasing for difficult named entities; cites strong gains on medical terminology |
| Paper | Domain adaptation with external off-policy acoustic catalogs | Describes scalable post-training ASR adaptation using synthetic acoustic catalogs and KNN fusion; relevant to rare-domain adaptation |
| Paper | ILASR | Privacy-preserving incremental-learning framework for production ASR, relevant to how AWS could update speech models without relying on sensitive customer data |
| Paper | AG-LSEC | Improves speaker diarization by grounding lexical speaker correction in acoustics; relevant to medical conversation turn attribution |
| Paper | Context-aware Transformer transducer | Strong evidence that Amazon speech teams use advanced transducer architectures for rare-word/context-sensitive ASR |
| Patent | Contextual biasing for speech recognition | Amazon patent family on bias encoders and bias attention for rare/contextual phrases; highly relevant to specialized terminology support |
| Patent | Infusing knowledge graphs into automatic speech recognition | Patent on injecting domain knowledge such as medications, diseases, and drugs into ASR |
| Patent | Using recurrent neural network for partitioning of audio and speaker diarization | Amazon patent-family evidence around diarization plus ASR concurrency and segmentation |
What we still don't know
The reviewed public material leaves several questions unresolved. AWS does not disclose Transcribe Medical's exact model family, medical training data sources or size, specialty-by-specialty benchmark scores, latency service objectives, or internal model version history. The public record also does not expose a complete service-team roster beyond blog authors and research contributors. Those gaps do not make the service unusable, but they do mean serious buyers should evaluate it as a managed black box with strong documentation and meaningful adjacent research, rather than as a fully transparent model platform.
Sources
- AWS announces Amazon Transcribe Medical: https://aws.amazon.com/about-aws/whats-new/2019/12/aws-announces-amazon-transcribe-medical-medical-speech-recognition/
- Amazon Transcribe Pricing: https://aws.amazon.com/transcribe/pricing/
- Introducing medical speech-to-text with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/introducing-medical-speech-to-text-with-amazon-transcribe-medical/
- Performing medical transcription analysis with Amazon Transcribe Medical and Amazon Comprehend Medical: https://aws.amazon.com/blogs/machine-learning/performing-medical-transcription-analysis-with-amazon-transcribe-medical-and-amazon-comprehend-medical/
- Amazon Transcribe Medical developer guide: https://docs.aws.amazon.com/transcribe/latest/dg/transcribe-medical.html
- Amazon Transcribe Medical product page: https://aws.amazon.com/transcribe/medical/
- StartMedicalStreamTranscription API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html
- StartMedicalTranscriptionJob API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html
- How Amazon Transcribe Medical works: https://docs.aws.amazon.com/transcribe/latest/dg/how-it-works-med.html
- Amazon Transcribe FAQs: https://aws.amazon.com/transcribe/faqs/
- Amazon Transcribe Medical now supports custom vocabulary: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-custom-vocabulary/
- Alternative medical transcriptions: https://docs.aws.amazon.com/transcribe/latest/dg/alternative-med-transcriptions.html
- Conversation diarization (medical): https://docs.aws.amazon.com/transcribe/latest/dg/conversation-diarization-med.html
- Multi-channel streaming and batch support: https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-transcribe-medical-now-supports-both-streaming-and-batch-transcription-of-multi-channel-audio/
- Automatic PHI identification: https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-transcribe-medical-now-provides-automatic-protected-health-information-phi-identification/
- AWS PrivateLink support for real-time streaming: https://aws.amazon.com/about-aws/whats-new/2020/06/announcing-aws-privatelink-support-for-amazon-transcribe-medical-real-time-streaming/
- Amazon Chime SDK live transcription: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-chime-sdk-amazon-transcribe-amazon-transcribe-medical/
- Amazon Transcribe API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/Welcome.html
- Amazon Transcribe endpoints and quotas, AWS General Reference: https://docs.aws.amazon.com/general/latest/gr/transcribe.html
- Teaching speech recognizers new words without retraining: https://www.amazon.science/blog/teaching-speech-recognizers-new-words-without-retraining
- Medical custom vocabularies: https://docs.aws.amazon.com/transcribe/latest/dg/vocabulary-med.html
- Robust acoustic and semantic contextual biasing in neural transducers for speech recognition: https://www.amazon.science/publications/robust-acoustic-and-semantic-contextual-biasing-in-neural-transducers-for-speech-recognition
- Domain adaptation with external off-policy acoustic catalogs: https://www.amazon.science/publications/domain-adaptation-with-external-off-policy-acoustic-catalogs-for-scalable-contextual-end-to-end-automated-speech-recognition
- Robust prediction of punctuation and truecasing for medical ASR: https://www.amazon.science/publications/robust-prediction-of-punctuation-and-truecasing-for-medical-asr
- AG-LSEC: audio-grounded lexical speaker error correction: https://www.amazon.science/publications/ag-lsec-audio-grounded-lexical-speaker-error-correction
- ILASR: privacy-preserving incremental learning for ASR at production scale: https://www.amazon.science/publications/ilasr-privacy-preserving-incremental-learning-for-automatic-speech-recognition-at-production-scale
- Enhancing speech-to-text accuracy of COVID-19 related terms with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/enhancing-speech-to-text-accuracy-of-covid-19-related-terms-with-amazon-transcribe-medical/
- The range of AWS's speech research on display at Interspeech: https://www.amazon.science/blog/the-range-of-awss-speech-research-is-on-display-at-interspeech
- AWS HIPAA compliance: https://aws.amazon.com/compliance/hipaa-compliance/
- Healthcare Dive on the Transcribe Medical launch: https://www.healthcaredive.com/news/amazons-new-medical-transcription-service-bolsters-voice-to-text-bid/568245/
- Google Cloud Speech-to-Text medical models: https://docs.cloud.google.com/speech-to-text/docs/v1/medical-models
- Google Cloud Speech-to-Text pricing: https://cloud.google.com/speech-to-text/pricing
- Dragon Medical One: https://www.microsoft.com/en-us/health-solutions/clinical-workflow/dragon-medical-one
- Azure Speech to text: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
- OpenAI Whisper: https://openai.com/index/whisper/
- NVIDIA NeMo Parakeet ASR models: https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/
- Google MedASR (Health AI Developer Foundations): https://developers.google.com/health-ai-developer-foundations/medasr
- JAMIA Open study on medical ASR performance: https://academic.oup.com/jamiaopen/article/7/4/ooae130/7920671
- Amazon Transcribe Medical now supports batch transcription: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-batch-transcription-of-medical-audio-files/
- Streaming transcription support for new specialties: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-transcribe-medical-streaming-transcription-support-medical-specialties/
- Listen, Know and Spell: knowledge-infused subword modeling for OOV named entities: https://assets.amazon.science/0c/47/311aae264493b8beefd696f7a295/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-oov-named-entities.pdf
- Context-aware Transformer transducer for speech recognition: https://www.amazon.science/publications/context-aware-transformer-transducer-for-speech-recognition
- Patent WO2020226789A1, contextual biasing for speech recognition: https://patents.google.com/patent/WO2020226789A1/en
- Patent US12400659B1, infusing knowledge graphs into ASR: https://patents.google.com/patent/US12400659B1/en
- Patent US10902843B2, RNN-based audio partitioning and speaker diarization: https://patents.google.com/patent/US10902843B2/en