How to extract medical information from records with AI
Quick answer
Use a large language model like
gpt-4o to process medical records by prompting it to extract structured data such as diagnoses, medications, and patient details. Combine OpenAI API calls with prompt engineering and optionally fine-tuning or few-shot learning for higher accuracy.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to send a medical record text to gpt-4o and extract key medical information like patient name, age, diagnosis, and medications using prompt engineering.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
medical_record = '''Patient Name: John Doe\nAge: 45\nChief Complaint: Persistent cough and fever\nDiagnosis: Acute bronchitis\nMedications: Amoxicillin 500mg, Paracetamol 650mg\n'''
prompt = f"Extract the following information from the medical record:\n- Patient Name\n- Age\n- Diagnosis\n- Medications\n\nMedical record:\n{medical_record}\n\nProvide the output as JSON."
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print("Extracted medical info:", response.choices[0].message.content) output
Extracted medical info: {
"Patient Name": "John Doe",
"Age": 45,
"Diagnosis": "Acute bronchitis",
"Medications": ["Amoxicillin 500mg", "Paracetamol 650mg"]
} Common variations
You can use asynchronous calls with asyncio for batch processing, switch to other models like claude-3-5-sonnet-20241022 for better medical domain understanding, or apply few-shot prompting with examples to improve extraction accuracy.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def extract_medical_info_async(record: str):
prompt = f"Extract patient name, age, diagnosis, and medications from the medical record:\n{record}\nReturn JSON."
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
record = "Patient Name: Jane Smith\nAge: 60\nDiagnosis: Hypertension\nMedications: Lisinopril 10mg"
result = await extract_medical_info_async(record)
print("Async extracted info:", result)
asyncio.run(main()) output
Async extracted info: {
"Patient Name": "Jane Smith",
"Age": 60,
"Diagnosis": "Hypertension",
"Medications": ["Lisinopril 10mg"]
} Troubleshooting
- If the model returns incomplete or ambiguous data, improve prompt clarity or add few-shot examples.
- For sensitive medical data, ensure compliance with HIPAA and use secure environments.
- If you hit rate limits, implement exponential backoff or batch requests.
Key Takeaways
- Use prompt engineering to guide
gpt-4oin extracting structured medical data from unstructured records. - Asynchronous API calls enable scalable processing of multiple records efficiently.
- Fine-tuning or few-shot learning can improve accuracy for domain-specific medical extraction tasks.