How to use pytest for LLM testing
Quick answer
Use
pytest to automate testing of LLM outputs by calling the model via its API client and asserting expected responses. Write test functions that invoke the LLM with prompts and verify the returned message.content matches your criteria.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 pytest
Setup
Install pytest and the OpenAI SDK. Set your API key as an environment variable to authenticate requests.
- Install packages:
pip install openai pytest - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai pytest output
Collecting openai Collecting pytest Successfully installed openai-1.x pytest-7.x
Step by step
Create a test file with functions that call the LLM using the OpenAI SDK and assert expected outputs. Use pytest to run these tests automatically.
import os
from openai import OpenAI
def test_llm_response_contains_keyword():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Say hello"}]
)
text = response.choices[0].message.content
assert "hello" in text.lower()
if __name__ == "__main__":
import pytest
pytest.main([__file__]) output
============================= test session starts ============================== collected 1 item test_llm.py . [100%] ============================== 1 passed in 3.45s ===============================
Common variations
You can test asynchronously, use different models, or test streaming outputs.
- Async tests with
pytest-asyncio - Test with Anthropic or other SDKs by adapting client calls
- Validate structured outputs or tool calls
import os
import pytest
from openai import OpenAI
@pytest.mark.asyncio
async def test_llm_async_response():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is 2+2?"}]
)
text = response.choices[0].message.content
assert "4" in text output
============================= test session starts ============================== collected 1 item test_async_llm.py . [100%] ============================== 1 passed in 2.12s ===============================
Troubleshooting
If tests fail due to authentication errors, verify your OPENAI_API_KEY environment variable is set correctly. For rate limits, add retries or reduce request frequency. If output varies, use regex or partial matching in assertions.
Key Takeaways
- Use
pytestto automate LLM output validation by asserting expected content. - Always load API keys from environment variables to keep credentials secure.
- Write both synchronous and asynchronous test functions depending on your SDK usage.
- Use flexible assertions like substring or regex to handle LLM output variability.
- Handle API errors and rate limits gracefully in your test suite.