How to Intermediate · 3 min read

AI for drug discovery

Quick answer
Use large language models (LLMs) and machine learning to analyze biomedical data, generate molecular structures, and predict drug-target interactions. Integrate OpenAI GPT-4o or specialized models with chemical databases to accelerate drug candidate identification and optimization.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • Basic knowledge of molecular biology and chemistry

Setup

Install the openai Python SDK and set your API key as an environment variable to access the gpt-4o model for drug discovery tasks.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to prompt gpt-4o to generate novel drug-like molecules based on a target protein description and predict their properties.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "You are a drug discovery assistant. Given the target protein 'Kinase XYZ', "
    "generate 3 novel small molecule drug candidates with SMILES notation and "
    "briefly describe their potential binding affinity and drug-likeness."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Drug candidates and analysis:\n", response.choices[0].message.content)
output
Drug candidates and analysis:
1. SMILES: CC1=CC(=O)N(C)C=C1C2=CC=CC=C2
   Description: Potential kinase inhibitor with good binding affinity due to aromatic rings and hydrogen bond acceptors.
2. SMILES: CCOC(=O)C1=CC=CC=C1N
   Description: Drug-like molecule with moderate polarity, likely to have good bioavailability.
3. SMILES: CCN(CC)C(=O)C1=CN=CN1
   Description: Small molecule with heterocyclic ring, predicted to bind ATP-binding site effectively.

Common variations

You can use asynchronous calls for faster throughput or switch to specialized models trained on chemical data. Streaming responses help process large outputs efficiently.

python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def generate_drug_candidates():
    prompt = (
        "Generate 3 novel drug candidates for target 'Kinase XYZ' with SMILES and properties."
    )
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or '', end='', flush=True)

asyncio.run(generate_drug_candidates())
output
1. SMILES: CC1=CC(=O)N(C)C=C1C2=CC=CC=C2
   Description: Potential kinase inhibitor with good binding affinity.
2. SMILES: CCOC(=O)C1=CC=CC=C1N
   Description: Drug-like molecule with moderate polarity.
3. SMILES: CCN(CC)C(=O)C1=CN=CN1
   Description: Small molecule with heterocyclic ring.

Troubleshooting

  • If you receive incomplete molecule data, increase max_tokens in the API call.
  • For unexpected outputs, refine your prompt with clearer instructions or add system messages.
  • Ensure your API key is valid and environment variable OPENAI_API_KEY is set.

Key Takeaways

  • Use gpt-4o to generate and analyze drug candidates with molecular SMILES notation.
  • Integrate biomedical domain knowledge in prompts for better AI-driven drug discovery results.
  • Leverage async and streaming API calls for efficient processing of large chemical datasets.
Verified 2026-04 · gpt-4o
Verify ↗