AI for clinical trial matching
Quick answer
Use LLMs like gpt-4o combined with patient and trial data embeddings to automate clinical trial matching. AI can analyze patient records and trial criteria to recommend suitable trials efficiently and accurately.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install faiss-cpupip install numpy
Setup
Install necessary Python packages and set your OpenAI API key as an environment variable.
- Use
pip install openai faiss-cpu numpyto install dependencies. - Set
OPENAI_API_KEYin your environment for authentication.
pip install openai faiss-cpu numpy output
Collecting openai Collecting faiss-cpu Collecting numpy Successfully installed openai faiss-cpu numpy
Step by step
This example demonstrates how to embed patient and clinical trial data, then use gpt-4o to generate matching recommendations.
import os
import numpy as np
from openai import OpenAI
import faiss
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample patient and trial descriptions
patient_text = "Patient: 55-year-old male with stage II lung cancer, non-smoker, no prior chemo."
trial_texts = [
"Trial A: Stage II lung cancer patients, no prior chemotherapy, age 50-65.",
"Trial B: Stage III lung cancer, smokers only.",
"Trial C: Any stage lung cancer, prior chemo allowed."
]
# Function to get embeddings
def get_embedding(text):
response = client.embeddings.create(model="text-embedding-3-small", input=text)
return np.array(response.data[0].embedding, dtype=np.float32)
# Embed patient and trials
patient_emb = get_embedding(patient_text)
trial_embs = np.array([get_embedding(t) for t in trial_texts])
# Build FAISS index for trial embeddings
index = faiss.IndexFlatL2(trial_embs.shape[1])
index.add(trial_embs)
# Search for closest trial embeddings
k = 1 # top 1 match
D, I = index.search(np.array([patient_emb]), k)
# Use LLM to generate detailed matching explanation
matched_trial = trial_texts[I[0][0]]
prompt = f"Given the patient description:\n{patient_text}\n\nAnd the clinical trial criteria:\n{matched_trial}\n\nExplain why this trial is a good match for the patient."
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print("Matched trial:", matched_trial)
print("LLM explanation:", response.choices[0].message.content) output
Matched trial: Trial A: Stage II lung cancer patients, no prior chemotherapy, age 50-65. LLM explanation: This clinical trial is a good match for the patient because the patient is a 55-year-old male diagnosed with stage II lung cancer, which fits the trial's stage and age criteria. Additionally, the patient has no prior chemotherapy, aligning with the trial's requirement. The patient's non-smoking status also matches the trial's eligibility, making this trial suitable.
Common variations
- Use asynchronous calls with
asyncioandclient.chat.completions.createfor higher throughput. - Switch to other embedding models like
text-embedding-3-largefor better accuracy. - Incorporate additional patient data fields and structured JSON inputs for more precise matching.
- Use streaming responses for real-time explanation generation.
Troubleshooting
- If embeddings are slow or inaccurate, verify your API key and try a different embedding model.
- If FAISS index search returns no relevant matches, check that embeddings are correctly computed and normalized.
- For API rate limits, implement exponential backoff or batch requests.
- If LLM explanations are off-topic, refine the prompt with clearer instructions or add system messages.
Key Takeaways
- Combine patient and trial data embeddings with FAISS for efficient similarity search.
- Use gpt-4o to generate human-readable explanations for trial matches.
- Set up environment and dependencies carefully to avoid API and embedding errors.