How to intermediate · 4 min read

ICD code extraction with AI

Quick answer
Use a large language model like gpt-4o to extract ICD codes by prompting it with medical text and requesting structured output. The model can identify and return ICD-10 codes from clinical notes or documents using few-shot or zero-shot prompting with client.chat.completions.create.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example sends a clinical note to gpt-4o and asks it to extract ICD-10 codes in JSON format.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

clinical_note = """
Patient presents with chest pain and shortness of breath. Diagnosed with acute myocardial infarction and type 2 diabetes mellitus.
"""

prompt = f"Extract ICD-10 codes from the following clinical note and return a JSON list of codes:\n\n{clinical_note}\n\nExample output: [\"I21.3\", \"E11.9\"]"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

icd_codes = response.choices[0].message.content
print("Extracted ICD codes:", icd_codes)
output
Extracted ICD codes: ["I21.3", "E11.9"]

Common variations

You can use asynchronous calls with asyncio or switch to other models like gpt-4o-mini for faster, cheaper extraction. Streaming output is less common for structured extraction but possible.

python
import os
import asyncio
from openai import OpenAI

async def extract_icd_async(note: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Extract ICD-10 codes from the clinical note:\n\n{note}\n\nReturn JSON list."
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

clinical_note = "Patient diagnosed with hypertension and chronic kidney disease."

result = asyncio.run(extract_icd_async(clinical_note))
print("Async extracted ICD codes:", result)
output
Async extracted ICD codes: ["I10", "N18.9"]

Troubleshooting

  • If the model returns text instead of JSON, explicitly instruct it to respond only with JSON.
  • For ambiguous or incomplete notes, provide few-shot examples in the prompt to improve accuracy.
  • Check your API key and environment variable if you get authentication errors.

Key Takeaways

  • Use gpt-4o with clear prompts to extract ICD codes from clinical text accurately.
  • Always instruct the model to output structured JSON for easy parsing and integration.
  • Async calls and smaller models like gpt-4o-mini can optimize cost and speed.
  • Few-shot prompting improves extraction quality on complex or ambiguous medical notes.
  • Validate API keys and environment setup to avoid common authentication issues.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗