How to add LLM tests to CI/CD pipeline
Quick answer
Add LLM tests to your CI/CD pipeline by writing automated Python scripts that call the LLM via the
OpenAI SDK, verify outputs against expected results, and run these tests in your CI environment. Use assertions on response.choices[0].message.content to validate model responses and catch regressions automatically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable to securely authenticate your requests.
pip install openai>=1.0
export OPENAI_API_KEY="your_api_key_here" # Linux/macOS
setx OPENAI_API_KEY "your_api_key_here" # Windows PowerShell output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x # No output for environment variable set command
Step by step
Create a Python test script that calls the LLM using the OpenAI SDK, sends a prompt, and asserts the response matches expected output. Integrate this script into your CI/CD pipeline to run on every commit or pull request.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def test_llm_response():
messages = [{"role": "user", "content": "What is the capital of France?"}]
response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
answer = response.choices[0].message.content.strip().lower()
assert "paris" in answer, f"Unexpected answer: {answer}"
if __name__ == "__main__":
test_llm_response()
print("LLM test passed successfully.") output
LLM test passed successfully.
Common variations
You can extend tests to async calls, use different models like gpt-4o-mini, or test streaming responses. For async, use asyncio with await client.chat.completions.create(...). For streaming, iterate over the response chunks.
import os
import asyncio
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def test_llm_async():
messages = [{"role": "user", "content": "Say hello in French."}]
response = await client.chat.completions.create(model="gpt-4o-mini", messages=messages)
answer = response.choices[0].message.content.strip().lower()
assert "bonjour" in answer, f"Unexpected answer: {answer}"
if __name__ == "__main__":
asyncio.run(test_llm_async())
print("Async LLM test passed.") output
Async LLM test passed.
Troubleshooting
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If assertions fail, check if the model output format or content changed and update your expected results accordingly.
- For rate limits, consider adding retries or running tests less frequently.
Key Takeaways
- Use the official
OpenAISDK with environment-based API keys for secure LLM testing. - Automate LLM response validation with assertions in Python test scripts integrated into CI/CD.
- Support async and streaming calls to cover different LLM interaction patterns in tests.