Medical transcription with AI
Quick answer
Use a large language model like
gpt-4o to transcribe medical audio by first converting speech to text with a model like whisper-1, then optionally refining the transcription with gpt-4o for medical terminology accuracy. This two-step approach leverages Whisper for audio transcription and GPT-4o for contextual medical text correction and formatting.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your OpenAI API key as an environment variable. This enables access to both speech-to-text and text-based LLM models.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to transcribe a medical audio file using whisper-1 for speech-to-text, then refine the transcription with gpt-4o to improve medical terminology and formatting.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Step 1: Transcribe audio with Whisper
with open("medical_audio.mp3", "rb") as audio_file:
transcript_response = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcript_text = transcript_response.text
print("Raw transcription:", transcript_text)
# Step 2: Refine transcription with GPT-4o
prompt = (
"You are a medical transcription assistant. "
"Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
refined_transcription = response.choices[0].message.content
print("\nRefined transcription:", refined_transcription) output
Raw transcription: Patient reports mild chest pain and shortness of breath. Refined transcription: Patient reports mild chest pain and shortness of breath. No signs of acute distress. Recommend ECG and further cardiac evaluation.
Common variations
- Use asynchronous calls with
asynciofor non-blocking transcription. - Stream transcription results for long audio files.
- Use different models like
gpt-4o-minifor cost-effective refinement. - Integrate with medical vocabularies or ontologies for enhanced accuracy.
import asyncio
from openai import OpenAI
async def async_transcribe_and_refine():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Async transcription is not directly supported, so simulate with sync call in thread or chunk audio
with open("medical_audio.mp3", "rb") as audio_file:
transcript_response = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
transcript_text = transcript_response.text
prompt = (
"You are a medical transcription assistant. "
"Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
)
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
print("Refined transcription (async):", response.choices[0].message.content)
asyncio.run(async_transcribe_and_refine()) output
Refined transcription (async): Patient reports mild chest pain and shortness of breath. No acute distress observed. Recommend ECG and cardiac evaluation.
Troubleshooting
- If transcription quality is poor, ensure audio is clear and in supported formats (mp3, wav, m4a).
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - For long audio files, split into smaller chunks before transcription to avoid timeouts.
- If medical terms are mistranscribed, use the refinement step with
gpt-4oor add domain-specific prompts.
Key Takeaways
- Use
whisper-1for accurate speech-to-text transcription of medical audio. - Refine raw transcriptions with
gpt-4oto improve medical terminology and formatting. - Set up environment variables and install the
openaipackage for seamless API access. - Handle long audio by chunking and consider async calls for scalable transcription pipelines.
- Troubleshoot by checking audio quality, API keys, and leveraging prompt engineering for domain accuracy.