How to Intermediate · 3 min read

How to add LLM tests to CI/CD pipeline

Quick answer

Add LLM tests to your CI/CD pipeline by writing automated Python scripts that call the LLM via the OpenAI SDK, verify outputs against expected results, and run these tests in your CI environment. Use assertions on response.choices[0].message.content to validate model responses and catch regressions automatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable to securely authenticate your requests.

bash

pip install openai>=1.0

export OPENAI_API_KEY="your_api_key_here"  # Linux/macOS
setx OPENAI_API_KEY "your_api_key_here"  # Windows PowerShell

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

# No output for environment variable set command

Step by step

Create a Python test script that calls the LLM using the OpenAI SDK, sends a prompt, and asserts the response matches expected output. Integrate this script into your CI/CD pipeline to run on every commit or pull request.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def test_llm_response():
    messages = [{"role": "user", "content": "What is the capital of France?"}]
    response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    answer = response.choices[0].message.content.strip().lower()
    assert "paris" in answer, f"Unexpected answer: {answer}"

if __name__ == "__main__":
    test_llm_response()
    print("LLM test passed successfully.")

output

LLM test passed successfully.

Common variations

You can extend tests to async calls, use different models like gpt-4o-mini, or test streaming responses. For async, use asyncio with await client.chat.completions.create(...). For streaming, iterate over the response chunks.

python

import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def test_llm_async():
    messages = [{"role": "user", "content": "Say hello in French."}]
    response = await client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    answer = response.choices[0].message.content.strip().lower()
    assert "bonjour" in answer, f"Unexpected answer: {answer}"

if __name__ == "__main__":
    asyncio.run(test_llm_async())
    print("Async LLM test passed.")

output

Async LLM test passed.

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
If assertions fail, check if the model output format or content changed and update your expected results accordingly.
For rate limits, consider adding retries or running tests less frequently.

✅

Key Takeaways

Use the official OpenAI SDK with environment-based API keys for secure LLM testing.
Automate LLM response validation with assertions in Python test scripts integrated into CI/CD.
Support async and streaming calls to cover different LLM interaction patterns in tests.

Verified 2026-04 · gpt-4o-mini

Verify ↗