How to beginner · 3 min read

Medical transcription with AI

Q: Medical transcription with AI

Use a large language model like gpt-4o to transcribe medical audio by first converting speech to text with a model like whisper-1, then optionally refining the transcription with gpt-4o for medical terminology accuracy. This two-step approach leverages Whisper for audio transcription and GPT-4o for contextual medical text correction and formatting.

Quick answer

Use a large language model like gpt-4o to transcribe medical audio by first converting speech to text with a model like whisper-1, then optionally refining the transcription with gpt-4o for medical terminology accuracy. This two-step approach leverages Whisper for audio transcription and GPT-4o for contextual medical text correction and formatting.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your OpenAI API key as an environment variable. This enables access to both speech-to-text and text-based LLM models.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to transcribe a medical audio file using whisper-1 for speech-to-text, then refine the transcription with gpt-4o to improve medical terminology and formatting.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Step 1: Transcribe audio with Whisper
with open("medical_audio.mp3", "rb") as audio_file:
    transcript_response = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )
transcript_text = transcript_response.text
print("Raw transcription:", transcript_text)

# Step 2: Refine transcription with GPT-4o
prompt = (
    "You are a medical transcription assistant. "
    "Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)
refined_transcription = response.choices[0].message.content
print("\nRefined transcription:", refined_transcription)

output

Raw transcription: Patient reports mild chest pain and shortness of breath.

Refined transcription: Patient reports mild chest pain and shortness of breath. No signs of acute distress. Recommend ECG and further cardiac evaluation.

Common variations

Use asynchronous calls with asyncio for non-blocking transcription.
Stream transcription results for long audio files.
Use different models like gpt-4o-mini for cost-effective refinement.
Integrate with medical vocabularies or ontologies for enhanced accuracy.

python

import asyncio
from openai import OpenAI

async def async_transcribe_and_refine():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    # Async transcription is not directly supported, so simulate with sync call in thread or chunk audio
    with open("medical_audio.mp3", "rb") as audio_file:
        transcript_response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file
        )
    transcript_text = transcript_response.text

    prompt = (
        "You are a medical transcription assistant. "
        "Please correct and format the following transcription for medical accuracy and clarity:\n" + transcript_text
    )

    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    print("Refined transcription (async):", response.choices[0].message.content)

asyncio.run(async_transcribe_and_refine())

output

Refined transcription (async): Patient reports mild chest pain and shortness of breath. No acute distress observed. Recommend ECG and cardiac evaluation.

Troubleshooting

If transcription quality is poor, ensure audio is clear and in supported formats (mp3, wav, m4a).
If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
For long audio files, split into smaller chunks before transcription to avoid timeouts.
If medical terms are mistranscribed, use the refinement step with gpt-4o or add domain-specific prompts.

✅

Key Takeaways

Use whisper-1 for accurate speech-to-text transcription of medical audio.
Refine raw transcriptions with gpt-4o to improve medical terminology and formatting.
Set up environment variables and install the openai package for seamless API access.
Handle long audio by chunking and consider async calls for scalable transcription pipelines.
Troubleshoot by checking audio quality, API keys, and leveraging prompt engineering for domain accuracy.

Verified 2026-04 · gpt-4o, gpt-4o-mini, whisper-1

Verify ↗