Amazon Transcribe Medical: model profile
Reference profile of Amazon Transcribe Medical, the AWS managed API for US English medical speech-to-text, launched in December 2019.
Amazon Transcribe Medical is AWS's managed medical automatic speech recognition service for converting clinician dictation and clinician-patient speech into text.
Specifications
| Developer | Amazon Web Services (AWS) |
| Released | December 2019 |
| Model type | Deep-learning-based automatic speech recognition service optimized for medical speech; exact architecture not disclosed |
| Languages | US English (en-US) only |
| Modes (batch / streaming) | Both; real-time streaming and batch transcription, with DICTATION and CONVERSATION audio types |
| Latency | Not publicly disclosed. AWS publishes no latency service objectives |
| Deployment | Managed AWS API via console, API calls, AWS CLI, and AWS SDKs; 12 commercial regions plus AWS GovCloud West |
| Pricing | AWS pricing page examples imply about $0.075 per minute, with a 60-minute monthly free tier for the first 12 months |
| License | Not publicly disclosed. Delivered as a proprietary managed AWS service |
Not disclosedParameters · Training data · Throughput / concurrency
Full technical breakdown9 sections
Overview
Amazon Transcribe Medical launched in December 2019 as a HIPAA-eligible capability of Amazon Transcribe, with real-time streaming at launch. Batch transcription was added in April 2020, custom medical vocabularies later that month, specialty expansion in late 2020, multi-channel support in December 2020, and automatic PHI identification in January 2021. As of June 2026, the publicly documented product remains a transcription-focused API service rather than a full ambient documentation agent; AWS positions AWS HealthScribe as the higher-level note-generation offering for clinical documentation workflows.
The service is an AWS API for US English medical speech transcription. AWS documents support for real-time streaming and batch transcription, two main audio modes (DICTATION and CONVERSATION), primary care plus multiple specialty-care domains, timestamps, confidence scores, alternative transcriptions, speaker diarization, channel identification, medical custom vocabularies, and PHI tagging. AWS positions it for clinical documentation, pharmacovigilance call review, telehealth subtitling, and healthcare contact-center scenarios.
AWS describes the service as "deep learning" and "state-of-the-art machine learning" based, but does not publish standardized word-error-rate benchmarks, model version numbers, latency targets, or the internal architecture of the managed service.
Capabilities and features
| Capability | Publicly documented status | Notes |
|---|---|---|
| Real-time transcription | Supported | StartMedicalStreamTranscription starts a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out. |
| Batch transcription | Supported | StartMedicalTranscriptionJob handles uploaded medical dictation or conversation files. |
| Audio types | Supported | AWS requires a Type such as DICTATION or CONVERSATION. |
| Language support | Limited | AWS FAQ says Transcribe Medical currently supports US English only. |
| Specialty support | Supported | Product page lists primary care plus cardiology, neurology, OB-GYN, pediatrics, oncology, radiology, and urology. |
| Medical custom vocabularies | Supported | Users can upload table-format vocabularies with IPA pronunciations and display forms. |
| Alternative transcriptions | Supported | Batch jobs can return 2 to 10 alternatives. |
| Word timestamps and confidence | Supported | Documented at launch and in API output. |
| Speaker diarization | Supported | AWS labels speakers and supports streaming plus batch diarization. |
| Channel identification | Supported | Added for both streaming and batch multi-channel audio in December 2020. |
| PHI identification | Supported | Added in January 2021 at no extra charge. |
| Private connectivity | Supported | AWS PrivateLink support for real-time streaming was announced in June 2020. |
| Chime SDK integration | Supported | Live transcription can be integrated via Amazon Chime SDK, including specialty and conversation type selection. |
| Clinical note generation | Not native to Transcribe Medical | AWS directs users needing a single note-generation API toward AWS HealthScribe. |
Customization depth
Transcribe Medical supports medical custom vocabularies, but in the public sources reviewed AWS does not document a customer-trainable custom medical language model analogous to standard Amazon Transcribe custom language models. AWS's custom language model FAQ is framed around standard Transcribe rather than Transcribe Medical, while the medical documentation emphasizes vocabularies.
Architecture: documented facts and adjacent research
AWS confirms that Transcribe Medical is a deep-learning-based ASR service with automatic punctuation and capitalization, specialty-aware transcription, dictation and conversational modes, custom medical vocabularies, speaker and channel logic, and optional PHI identification. It documents a stateless service posture and an API-first delivery model. It does not publish the acoustic model family, decoder design, training corpus size, language-model design, or a release-by-release model identifier history.
The source notes that Amazon Science publications show AWS speech teams working on related methods, and that AWS does not state that any one paper maps one-to-one onto the production Transcribe Medical stack.
| Technology area | What AWS documents for Transcribe Medical | What AWS public research suggests | Assessment (per source) |
|---|---|---|---|
| Core ASR model | "Deep learning" / "state-of-the-art machine learning" medical ASR. | Amazon speech teams publish on CTC, neural transducers, and context-aware transformer transducers for production ASR. | High confidence that the service uses modern end-to-end ASR; low confidence on the exact architecture because AWS does not disclose it. |
| Language modeling and rare-term handling | Supports specialty selection and medical custom vocabularies. | AWS papers describe contextual biasing, semantic/acoustic biasing, and knowledge-graph support for out-of-vocabulary entities, including medical terminology. | Strong evidence that rare-term biasing is a major design theme; exact LM design for Transcribe Medical is not public. |
| Domain adaptation | API requires Specialty and Type; AWS expanded specialty coverage over time. | AWS has published post-training domain adaptation methods using synthetic acoustic catalogs and KNN fusion. | Strong evidence of domain-conditioned decoding/modeling, though whether this appears as separate specialty models or lighter adaptation is undisclosed. |
| Noise and acoustic robustness | AWS FAQ says Transcribe is designed for variation in volume, pitch, and speaking rate, but noise, overlap, accents, and code-switching can degrade output. | No medical-specific public paper clearly documents the production front-end denoising stack. | Public documentation is enough to know limits, not enough to reverse-engineer the front end. |
| Punctuation and casing | Automatic punctuation and capitalization are part of the launch and product positioning. | AWS medical ASR paper uses BERT/BioBERT/RoBERTa for punctuation and truecasing, with domain adaptation and augmentation. | Very likely that punctuation/truecasing is a distinct downstream stage or integrated module. |
| Speaker diarization | Documented for streaming and batch; output includes speaker labels; overlapping speech is linearized by start time. | AWS research focuses on reducing speaker errors with audio-grounded lexical correction. | Public docs describe the interface; research suggests active work on improving turn-attribution around overlaps. |
| Privacy-preserving learning | AWS says medical customer content is not used to improve AWS AI technologies. | AWS also publishes privacy-preserving continual-learning work using ephemeral, weakly supervised data in production ASR. | Suggests AWS has internal methods for model refresh under privacy constraints, but not necessarily on medical customer data. |
Language support
Transcribe Medical is documented only for US English (en-US) medical transcription. AWS's public product page lists transcription support for primary care and specialty areas including cardiology, neurology, obstetrics-gynecology, pediatrics, oncology, radiology, and urology. The documentation page for "Medical specialties and terms" describes PRIMARYCARE as covering family medicine, internal medicine, OB-GYN, and pediatrics.
Performance and benchmarks
Vendor-reported: AWS does not publish standardized word-error-rate benchmarks, a public model card for Transcribe Medical, a specialty-by-specialty scorecard, or latency service objectives.
Third-party evaluation: a 2024 JAMIA Open study reported that AWS Medical outperformed AWS General on medical proper nouns, while also finding disparities in performance across speech from Black and White patients and persistent difficulty with spontaneous conversational phenomena. A 2023 digital-scribe comparison observed that word-diarization error differed little across speakers in most models, but Amazon Medical Conversation ASR showed a larger clinician-side gap in that study's setup.
The source states that a quantitative accuracy-versus-latency chart across AWS, Google, and Nuance would be misleading because the vendors do not publish directly comparable medical-ASR benchmark suites with normalized latency methodology. It provides the following capability and delivery comparison, which it describes as an analytical inference from public delivery models and documented feature depth, not a vendor-provided benchmark.
| Competitor | Delivery model | Medical specialization | Customization | Public pricing signal | Comparative read versus AWS (per source) |
|---|---|---|---|---|---|
| Amazon Transcribe Medical | Managed AWS API | Yes, medical-specific transcription | Medical custom vocabularies; specialty and type selection | AWS worked examples imply about $0.075/min with a 60-minute monthly free tier for first 12 months. | Strong developer fit, wide AWS integration, limited public transparency, transcription-first rather than workflow-first |
| Google Cloud Speech-to-Text medical models | Managed cloud API | Yes, separate medical dictation and medical conversation models | Alternate transcriptions, timestamps, confidence; conversation diarization; dictation spoken punctuation/formatting/headings | $0.078/min after first 60 free minutes per month. | Very similar API-layer competitor; slightly higher public list price; strong documentation for dictation formatting behaviors |
| Dragon Medical One | Clinician-facing documentation software | Yes, purpose-built clinical documentation product | Extensive end-user vocabulary, commands, templates, workflow features | Public price not clearly exposed in the reviewed official pages; licensing/sales-led procurement | Stronger ready-made clinical workflow and EHR ergonomics; weaker as a simple developer API building block |
| Azure Speech plus Microsoft healthcare stack | General cloud speech platform plus Nuance products | Public docs position healthcare as a use case, but Microsoft's healthcare-specific speech story is mostly Dragon/Dragon Copilot | Custom speech and general speech platform tooling | Official page clearly exposes free tier structure and per-second billing, but exact paid rates were not recoverable from the static pricing HTML reviewed. | If you want Microsoft-native general speech plus customization, Azure fits; if you want healthcare-specialized voice, Microsoft steers customers to Dragon |
| Open-source Whisper | Self-hosted model/software | No, general-purpose | Full deployment control, but no managed medical workflow | Infra cost only | Excellent flexibility and broad robustness, but customer owns validation, security, compliance, and medical adaptation |
| Open-source Parakeet | Self-hosted/open-source model | No dedicated medical specialization in the reviewed source | Full deployment control; punctuation and timestamps | Infra cost only | Attractive for performance and openness, but requires significant speech MLOps |
| Open MedASR | Open medical model | Yes, medical dictation/transcription | Fine-tunable health-domain model | Infra cost only | Most directly analogous open alternative for medical dictation, but still not a managed HIPAA-ready service by itself |
Latency and throughput
AWS does not publish latency targets or a latency SLO for Transcribe Medical. Real-time streaming operates through StartMedicalStreamTranscription, a bidirectional HTTP/2 or WebSocket stream for audio-in / text-out. The August 2021 Amazon Chime SDK live transcription integration is documented as a lower-latency meeting use case. Throughput and concurrency figures are not publicly disclosed.
Deployment and integrations
The service is available through AWS console workflows, API calls, AWS CLI, and AWS SDKs. Public API references and FAQs show the medical APIs alongside the broader Transcribe service family, with Boto3 examples for custom vocabulary creation and REST-style operation references for jobs and streams.
AWS's endpoint documentation lists Transcribe Medical endpoints in 12 commercial regions plus AWS GovCloud West, including US East North Virginia and Ohio, US West Northern California and Oregon, Canada Central, Europe Ireland, London, and Frankfurt, and Asia Pacific Seoul, Singapore, Sydney, and Tokyo.
Documented integration patterns include Amazon Comprehend Medical, Twilio Media Streams, Veritas telehealth review workflows, and Amazon Chime SDK, plus downstream AWS services such as HealthLake, S3, Athena, and Bedrock in customer-built pipelines.
Security and compliance
Transcribe Medical is described by AWS as HIPAA-eligible, available under AWS's Business Associate Addendum, and subject to the AWS shared responsibility model. AWS states that BAA customers must encrypt PHI at rest and in transit, and that customers remain responsible for correct service configuration and lawful use.
The medical FAQ is stricter than the general Transcribe FAQ. The general Transcribe FAQ says content may be stored and used to provide, maintain, improve, and develop Amazon Transcribe and related AI technologies unless customers opt out. AWS says Amazon Transcribe Medical does not use content processed by the service for any purpose other than to provide and maintain the service, and does not use that content to improve Amazon Transcribe Medical or other Amazon AI technologies. The product page describes the service as stateless: it neither stores inbound audio nor output text, and leaves storage choices to the customer.
| Consideration | AWS public position | Practical implication (per source) |
|---|---|---|
| HIPAA eligibility | Yes. | Useful for PHI workflows, but only with a BAA and compliant architecture around the service. |
| BAA and encryption duties | AWS says BAA customers must encrypt PHI at rest and in transit. | Security controls remain partly customer-owned. |
| Data retention stance | Product page says stateless; FAQ says medical content is not used to improve AWS AI. | Stronger privacy posture than standard Transcribe, at least in public documentation. |
| PHI identification | Available at no additional charge. | Helps redaction workflows, but is not a substitute for full de-identification review. |
| PHI de-identification | AWS explicitly warns PHI identification may not accurately identify PHI in all circumstances and does not satisfy HIPAA de-identification requirements. | Human review or separate de-identification controls are still required. |
| Custom vocabulary content | AWS says do not include PII or PHI in medical custom vocabularies. | Customers need governance for vocabulary curation. |
| Private networking | PrivateLink for real-time streaming is available. | Reduces exposure to the public internet and fits stricter network topologies. |
| Region choice | Multiple commercial regions plus GovCloud West are documented. | Supports residency and procurement choices, but end-to-end residency depends on all connected services. |
AWS documentation states that Transcribe Medical is not a substitute for professional medical advice, diagnosis, or treatment, and that users should apply confidence thresholds and human review where accuracy needs are high.
Pricing
AWS's static pricing page examples imply a medical transcription rate of about $0.075 per minute, with a 60-minute monthly free tier for the first 12 months. For comparison, the source reports Google's official medical Speech-to-Text pricing at $0.078 per minute after its own first 60 free minutes each month. PHI identification is available at no additional charge.
Development and ownership
Transcribe Medical is developed by Amazon Web Services. The source describes it as sitting at the intersection of productized AWS AI services (AWS AI / AWS Machine Learning) and the broader Amazon Science speech-research program. AWS has not published a full engineering roster for the service; the following people and organizations are tied to it in the public record.
| Publicly identified person or org | Role in the public record | Relevance |
|---|---|---|
| Vasi Philomin | GM for Machine Learning and AI at AWS; launch blog author | Public launch sponsor/executive owner across AWS language services in 2019 |
| Paul Zhao | Product Manager at AWS Machine Learning managing Amazon Transcribe | Direct product-facing owner named in Transcribe Medical blog materials |
| Katrin Kirchhoff | Senior Manager and Principal Scientist at AWS AI in 2020; later described as Director of Speech Processing for AWS; affiliated with AWS AI Labs in research literature | Key public research leader for AWS speech technologies relevant to Transcribe |
| Scott Seyfarth | Data Scientist at AWS AI working on improving Amazon Transcribe and Transcribe Medical | Directly tied to service improvement in public author bios |
| Ruoyu Huang | Software Development Engineer at Amazon Transcribe | Publicly named engineering contributor on Transcribe Medical customization work |
| AWS AI / AWS Machine Learning / Amazon Science speech teams | Product and research organizations behind AWS language and speech services | The most visible institutions behind the service |
| Cerner, Amgen, SoundLines/HealthChannels | Early public customers or quoted adopters | Evidence of early industry uptake in EHR, pharmacovigilance, and care-team workflows |
Adjacent research and patents
The source states these papers and patents should be treated as adjacent technical evidence, not as official reverse-engineering of the production Transcribe Medical service.
| Type | Source | Short summary |
|---|---|---|
| Paper | Robust prediction of punctuation and truecasing for medical ASR | AWS medical-ASR paper using pretrained masked language models and medical-domain adaptation for punctuation/truecasing; especially relevant to dictation usability |
| Paper | Listen, Know and Spell | Shows AWS AI interest in knowledge-graph infusion for OOV named entities in domains such as medical ASR |
| Blog plus paper pointer | Teaching speech recognizers new words without retraining | Explains contextual adapters and decoder biasing for difficult named entities; cites strong gains on medical terminology |
| Paper | Domain adaptation with external off-policy acoustic catalogs | Describes scalable post-training ASR adaptation using synthetic acoustic catalogs and KNN fusion; relevant to rare-domain adaptation |
| Paper | ILASR | Privacy-preserving incremental-learning framework for production ASR, relevant to how AWS could update speech models without relying on sensitive customer data |
| Paper | AG-LSEC | Improves speaker diarization by grounding lexical speaker correction in acoustics; relevant to medical conversation turn attribution |
| Paper | Context-aware Transformer transducer | Strong evidence that Amazon speech teams use advanced transducer architectures for rare-word/context-sensitive ASR |
| Patent | Contextual biasing for speech recognition | Amazon patent family on bias encoders and bias attention for rare/contextual phrases; highly relevant to specialized terminology support |
| Patent | Infusing knowledge graphs into automatic speech recognition | Patent on injecting domain knowledge such as medications, diseases, and drugs into ASR |
| Patent | Using recurrent neural network for partitioning of audio and speaker diarization | Amazon patent-family evidence around diarization plus ASR concurrency and segmentation |
Release history
AWS's public history for Transcribe Medical is feature-oriented rather than version-oriented. Customers can reconstruct major milestones from launch posts, docs, and "What's New" announcements, but AWS does not publicly expose a numbered model lineage, model cards for Transcribe Medical itself, or a release log with benchmark deltas. The timeline below is compiled from AWS launch posts and official "What's New" announcements.
| Date | Milestone |
|---|---|
| December 2019 | Launch as a HIPAA-eligible capability of Amazon Transcribe with real-time streaming, word timestamps, confidence scores, and punctuation/capitalization |
| April 2020 | Batch transcription of medical audio files added |
| April 2020 | Custom medical vocabularies added, with IPA pronunciation support, display forms, and batch plus streaming support |
| June 2020 | AWS PrivateLink support for real-time streaming announced |
| November 2020 | Streaming transcription support for cardiology, oncology, neurology, radiology, and urology specialties |
| December 2020 | Multi-channel support for both streaming and batch transcription |
| January 2021 | Automatic protected health information (PHI) identification added at no extra charge |
| August 2021 | Amazon Chime SDK live transcription integration, including specialty and conversation type selection |
Adoption evidence at and after launch: Cerner said it was developing a digital voice scribe on top of Transcribe Medical; Amgen cited use in pharmacovigilance call review; SoundLines/HealthChannels described using the API in care-team and analytics workflows. Healthcare Dive wrote that the 2019 launch bolstered Amazon's voice-to-text ambitions and highlighted its more specialized medical vocabulary focus.
Sources
| Source | What it adds |
|---|---|
| AWS announces Amazon Transcribe Medical | Official launch record: Dec. 2019 release date, HIPAA eligibility, real-time streaming, word timestamps, confidence scores, punctuation/capitalization, Comprehend Medical linkage |
| Introducing medical speech-to-text with Amazon Transcribe Medical | Launch rationale, workflow framing, customer quotes from Cerner, Amgen, and SoundLines/HealthChannels, plus Vasi Philomin role |
| Amazon Transcribe Medical now supports batch transcription | Confirms Apr. 2020 batch release and early batch capabilities including speaker/channel separation context |
| Amazon Transcribe Medical now supports custom vocabulary | Confirms Apr. 2020 vocabulary release, IPA pronunciation support, display forms, and batch plus streaming support |
| Announcing AWS PrivateLink support | Security/networking milestone for private access to streaming API |
| Streaming transcription support for new specialties | Public milestone for cardiology, oncology, neurology, radiology, and urology specialist support |
| Multi-channel support for streaming and batch | Confirms channel identification milestone for telehealth and pharmacovigilance scenarios |
| Automatic PHI identification | Adds PHI tagging and explicitly frames redaction workflows |
| Amazon Chime SDK live transcription support | Shows AWS ecosystem integration and lower-latency meeting use case |
| Amazon Transcribe Medical product page | Best current high-level feature and positioning summary, including today's specialty list and HealthScribe handoff |
Citation list
- AWS announces Amazon Transcribe Medical: https://aws.amazon.com/about-aws/whats-new/2019/12/aws-announces-amazon-transcribe-medical-medical-speech-recognition/
- Amazon Transcribe Pricing: https://aws.amazon.com/transcribe/pricing/
- Introducing medical speech-to-text with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/introducing-medical-speech-to-text-with-amazon-transcribe-medical/
- Performing medical transcription analysis with Amazon Transcribe Medical and Amazon Comprehend Medical: https://aws.amazon.com/blogs/machine-learning/performing-medical-transcription-analysis-with-amazon-transcribe-medical-and-amazon-comprehend-medical/
- Amazon Transcribe Medical developer guide: https://docs.aws.amazon.com/transcribe/latest/dg/transcribe-medical.html
- Amazon Transcribe Medical product page: https://aws.amazon.com/transcribe/medical/
- StartMedicalStreamTranscription API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html
- StartMedicalTranscriptionJob API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/API_StartMedicalTranscriptionJob.html
- How Amazon Transcribe Medical works: https://docs.aws.amazon.com/transcribe/latest/dg/how-it-works-med.html
- Amazon Transcribe FAQs: https://aws.amazon.com/transcribe/faqs/
- Amazon Transcribe Medical now supports custom vocabulary: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-custom-vocabulary/
- Alternative medical transcriptions: https://docs.aws.amazon.com/transcribe/latest/dg/alternative-med-transcriptions.html
- Conversation diarization (medical): https://docs.aws.amazon.com/transcribe/latest/dg/conversation-diarization-med.html
- Multi-channel streaming and batch support: https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-transcribe-medical-now-supports-both-streaming-and-batch-transcription-of-multi-channel-audio/
- Automatic PHI identification: https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-transcribe-medical-now-provides-automatic-protected-health-information-phi-identification/
- AWS PrivateLink support for real-time streaming: https://aws.amazon.com/about-aws/whats-new/2020/06/announcing-aws-privatelink-support-for-amazon-transcribe-medical-real-time-streaming/
- Amazon Chime SDK live transcription: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-chime-sdk-amazon-transcribe-amazon-transcribe-medical/
- Amazon Transcribe API reference: https://docs.aws.amazon.com/transcribe/latest/APIReference/Welcome.html
- Amazon Transcribe endpoints and quotas, AWS General Reference: https://docs.aws.amazon.com/general/latest/gr/transcribe.html
- Teaching speech recognizers new words without retraining: https://www.amazon.science/blog/teaching-speech-recognizers-new-words-without-retraining
- Medical custom vocabularies: https://docs.aws.amazon.com/transcribe/latest/dg/vocabulary-med.html
- Robust acoustic and semantic contextual biasing in neural transducers for speech recognition: https://www.amazon.science/publications/robust-acoustic-and-semantic-contextual-biasing-in-neural-transducers-for-speech-recognition
- Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition: https://www.amazon.science/publications/domain-adaptation-with-external-off-policy-acoustic-catalogs-for-scalable-contextual-end-to-end-automated-speech-recognition
- Robust prediction of punctuation and truecasing for medical ASR: https://www.amazon.science/publications/robust-prediction-of-punctuation-and-truecasing-for-medical-asr
- AG-LSEC: audio-grounded lexical speaker error correction: https://www.amazon.science/publications/ag-lsec-audio-grounded-lexical-speaker-error-correction
- ILASR: privacy-preserving incremental learning for automatic speech recognition at production scale: https://www.amazon.science/publications/ilasr-privacy-preserving-incremental-learning-for-automatic-speech-recognition-at-production-scale
- Enhancing speech-to-text accuracy of COVID-19-related terms with Amazon Transcribe Medical: https://aws.amazon.com/blogs/machine-learning/enhancing-speech-to-text-accuracy-of-covid-19-related-terms-with-amazon-transcribe-medical/
- The range of AWS's speech research is on display at Interspeech: https://www.amazon.science/blog/the-range-of-awss-speech-research-is-on-display-at-interspeech
- AWS HIPAA compliance: https://aws.amazon.com/compliance/hipaa-compliance/
- Healthcare Dive coverage: https://www.healthcaredive.com/news/amazons-new-medical-transcription-service-bolsters-voice-to-text-bid/568245/
- Google Cloud Speech-to-Text medical models: https://docs.cloud.google.com/speech-to-text/docs/v1/medical-models
- Google Cloud Speech-to-Text pricing: https://cloud.google.com/speech-to-text/pricing
- Dragon Medical One: https://www.microsoft.com/en-us/health-solutions/clinical-workflow/dragon-medical-one
- Azure Speech to text: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-to-text
- Azure Speech pricing: https://azure.microsoft.com/en-us/pricing/details/speech/
- OpenAI Whisper: https://openai.com/index/whisper/
- NVIDIA NeMo Parakeet ASR models: https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/
- Google MedASR: https://developers.google.com/health-ai-developer-foundations/medasr
- JAMIA Open study: https://academic.oup.com/jamiaopen/article/7/4/ooae130/7920671
- Batch transcription announcement: https://aws.amazon.com/about-aws/whats-new/2020/04/amazon-transcribe-medical-now-supports-batch-transcription-of-medical-audio-files/
- Streaming specialty support announcement: https://aws.amazon.com/about-aws/whats-new/2020/11/amazon-transcribe-medical-streaming-transcription-support-medical-specialties/
- Listen, Know and Spell: knowledge-infused subword modeling for improving ASR performance of OOV named entities: https://assets.amazon.science/0c/47/311aae264493b8beefd696f7a295/listen-know-and-spell-knowledge-infused-subword-modeling-for-improving-asr-performance-of-oov-named-entities.pdf
- Context-aware Transformer transducer for speech recognition: https://www.amazon.science/publications/context-aware-transformer-transducer-for-speech-recognition
- Contextual biasing for speech recognition (patent): https://patents.google.com/patent/WO2020226789A1/en
- Infusing knowledge graphs into automatic speech recognition (patent): https://patents.google.com/patent/US12400659B1/en
- Using recurrent neural network for partitioning of audio and speaker diarization (patent): https://patents.google.com/patent/US10902843B2/en