How to beginner · 3 min read

Medical transcription with AI

Quick answer
Use a large language model like gpt-4o to transcribe medical audio by first converting speech to text with a model like whisper-1, then optionally refining the transcription with gpt-4o for medical terminology accuracy. This two-step approach leverages Whisper for audio transcription and GPT-4o for contextual medical text correction and formatting.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your OpenAI API key as an environment variable. This enables access to both speech-to-text and text-based LLM models.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to transcribe a medical audio file using whisper-1 for speech-to-text, then refine the transcription with gpt-4o to improve medical terminology and formatting.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Step 1: Transcribe audio with Whisper
with open("medical_audio.mp3", "rb") as audio_file:
    transcript_response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )
transcript_text = transcript_response.text
print("Raw transcription:", transcript_text)

# Step 2: Refine transcription with GPT-4o
prompt = (
    "You are a medical transcription assistant. "
    "Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)
refined_transcription = response.choices[0].message.content
print("\nRefined transcription:", refined_transcription)
output
Raw transcription: Patient reports mild chest pain and shortness of breath.

Refined transcription: Patient reports mild chest pain and shortness of breath. No signs of acute distress. Recommend ECG and further cardiac evaluation.

Common variations

  • Use asynchronous calls with asyncio for non-blocking transcription.
  • Stream transcription results for long audio files.
  • Use different models like gpt-4o-mini for cost-effective refinement.
  • Integrate with medical vocabularies or ontologies for enhanced accuracy.
python
import asyncio
from openai import OpenAI

async def async_transcribe_and_refine():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    # Async transcription is not directly supported, so simulate with sync call in thread or chunk audio
    with open("medical_audio.mp3", "rb") as audio_file:
        transcript_response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
    transcript_text = transcript_response.text

    prompt = (
        "You are a medical transcription assistant. "
        "Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
    )

    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    print("Refined transcription (async):", response.choices[0].message.content)

asyncio.run(async_transcribe_and_refine())
output
Refined transcription (async): Patient reports mild chest pain and shortness of breath. No acute distress observed. Recommend ECG and cardiac evaluation.

Troubleshooting

  • If transcription quality is poor, ensure audio is clear and in supported formats (mp3, wav, m4a).
  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For long audio files, split into smaller chunks before transcription to avoid timeouts.
  • If medical terms are mistranscribed, use the refinement step with gpt-4o or add domain-specific prompts.

Key Takeaways

  • Use whisper-1 for accurate speech-to-text transcription of medical audio.
  • Refine raw transcriptions with gpt-4o to improve medical terminology and formatting.
  • Set up environment variables and install the openai package for seamless API access.
  • Handle long audio by chunking and consider async calls for scalable transcription pipelines.
  • Troubleshoot by checking audio quality, API keys, and leveraging prompt engineering for domain accuracy.
Verified 2026-04 · gpt-4o, gpt-4o-mini, whisper-1
Verify ↗