How to Intermediate · 3 min read

How to add LLM tests to CI/CD pipeline

Quick answer
Add LLM tests to your CI/CD pipeline by writing automated Python scripts that call the LLM via the OpenAI SDK, verify outputs against expected results, and run these tests in your CI environment. Use assertions on response.choices[0].message.content to validate model responses and catch regressions automatically.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable to securely authenticate your requests.

bash
pip install openai>=1.0

export OPENAI_API_KEY="your_api_key_here"  # Linux/macOS
setx OPENAI_API_KEY "your_api_key_here"  # Windows PowerShell
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

# No output for environment variable set command

Step by step

Create a Python test script that calls the LLM using the OpenAI SDK, sends a prompt, and asserts the response matches expected output. Integrate this script into your CI/CD pipeline to run on every commit or pull request.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def test_llm_response():
    messages = [{"role": "user", "content": "What is the capital of France?"}]
    response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    answer = response.choices[0].message.content.strip().lower()
    assert "paris" in answer, f"Unexpected answer: {answer}"

if __name__ == "__main__":
    test_llm_response()
    print("LLM test passed successfully.")
output
LLM test passed successfully.

Common variations

You can extend tests to async calls, use different models like gpt-4o-mini, or test streaming responses. For async, use asyncio with await client.chat.completions.create(...). For streaming, iterate over the response chunks.

python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def test_llm_async():
    messages = [{"role": "user", "content": "Say hello in French."}]
    response = await client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    answer = response.choices[0].message.content.strip().lower()
    assert "bonjour" in answer, f"Unexpected answer: {answer}"

if __name__ == "__main__":
    asyncio.run(test_llm_async())
    print("Async LLM test passed.")
output
Async LLM test passed.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If assertions fail, check if the model output format or content changed and update your expected results accordingly.
  • For rate limits, consider adding retries or running tests less frequently.

Key Takeaways

  • Use the official OpenAI SDK with environment-based API keys for secure LLM testing.
  • Automate LLM response validation with assertions in Python test scripts integrated into CI/CD.
  • Support async and streaming calls to cover different LLM interaction patterns in tests.
Verified 2026-04 · gpt-4o-mini
Verify ↗