How to Intermediate · 3 min read

AI for clinical trial matching

Quick answer

Use LLMs like gpt-4o combined with patient and trial data embeddings to automate clinical trial matching. AI can analyze patient records and trial criteria to recommend suitable trials efficiently and accurately.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install faiss-cpu
pip install numpy

Setup

Install necessary Python packages and set your OpenAI API key as an environment variable.

Use pip install openai faiss-cpu numpy to install dependencies.
Set OPENAI_API_KEY in your environment for authentication.

bash

pip install openai faiss-cpu numpy

output

Collecting openai
Collecting faiss-cpu
Collecting numpy
Successfully installed openai faiss-cpu numpy

Step by step

This example demonstrates how to embed patient and clinical trial data, then use gpt-4o to generate matching recommendations.

python

import os
import numpy as np
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample patient and trial descriptions
patient_text = "Patient: 55-year-old male with stage II lung cancer, non-smoker, no prior chemo."
trial_texts = [
    "Trial A: Stage II lung cancer patients, no prior chemotherapy, age 50-65.",
    "Trial B: Stage III lung cancer, smokers only.",
    "Trial C: Any stage lung cancer, prior chemo allowed."
]

# Function to get embeddings
def get_embedding(text):
    response = client.embeddings.create(model="text-embedding-3-small", input=text)
    return np.array(response.data[0].embedding, dtype=np.float32)

# Embed patient and trials
patient_emb = get_embedding(patient_text)
trial_embs = np.array([get_embedding(t) for t in trial_texts])

# Build FAISS index for trial embeddings
index = faiss.IndexFlatL2(trial_embs.shape[1])
index.add(trial_embs)

# Search for closest trial embeddings
k = 1  # top 1 match
D, I = index.search(np.array([patient_emb]), k)

# Use LLM to generate detailed matching explanation
matched_trial = trial_texts[I[0][0]]
prompt = f"Given the patient description:\n{patient_text}\n\nAnd the clinical trial criteria:\n{matched_trial}\n\nExplain why this trial is a good match for the patient."

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Matched trial:", matched_trial)
print("LLM explanation:", response.choices[0].message.content)

output

Matched trial: Trial A: Stage II lung cancer patients, no prior chemotherapy, age 50-65.
LLM explanation: This clinical trial is a good match for the patient because the patient is a 55-year-old male diagnosed with stage II lung cancer, which fits the trial's stage and age criteria. Additionally, the patient has no prior chemotherapy, aligning with the trial's requirement. The patient's non-smoking status also matches the trial's eligibility, making this trial suitable.

Common variations

Use asynchronous calls with asyncio and client.chat.completions.create for higher throughput.
Switch to other embedding models like text-embedding-3-large for better accuracy.
Incorporate additional patient data fields and structured JSON inputs for more precise matching.
Use streaming responses for real-time explanation generation.

Troubleshooting

If embeddings are slow or inaccurate, verify your API key and try a different embedding model.
If FAISS index search returns no relevant matches, check that embeddings are correctly computed and normalized.
For API rate limits, implement exponential backoff or batch requests.
If LLM explanations are off-topic, refine the prompt with clearer instructions or add system messages.

✅

Key Takeaways

Combine patient and trial data embeddings with FAISS for efficient similarity search.
Use gpt-4o to generate human-readable explanations for trial matches.
Set up environment and dependencies carefully to avoid API and embedding errors.

Verified 2026-04 · gpt-4o, text-embedding-3-small

Verify ↗