How to beginner · 3 min read

How to use pytest for LLM testing

Quick answer
Use pytest to automate testing of LLM outputs by calling the model via its API client and asserting expected responses. Write test functions that invoke the LLM with prompts and verify the returned message.content matches your criteria.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 pytest

Setup

Install pytest and the OpenAI SDK. Set your API key as an environment variable to authenticate requests.

  • Install packages: pip install openai pytest
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai pytest
output
Collecting openai
Collecting pytest
Successfully installed openai-1.x pytest-7.x

Step by step

Create a test file with functions that call the LLM using the OpenAI SDK and assert expected outputs. Use pytest to run these tests automatically.

python
import os
from openai import OpenAI

def test_llm_response_contains_keyword():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Say hello"}]
    )
    text = response.choices[0].message.content
    assert "hello" in text.lower()

if __name__ == "__main__":
    import pytest
    pytest.main([__file__])
output
============================= test session starts ==============================
collected 1 item

test_llm.py .                                                             [100%]

============================== 1 passed in 3.45s ===============================

Common variations

You can test asynchronously, use different models, or test streaming outputs.

  • Async tests with pytest-asyncio
  • Test with Anthropic or other SDKs by adapting client calls
  • Validate structured outputs or tool calls
python
import os
import pytest
from openai import OpenAI

@pytest.mark.asyncio
async def test_llm_async_response():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is 2+2?"}]
    )
    text = response.choices[0].message.content
    assert "4" in text
output
============================= test session starts ==============================
collected 1 item

test_async_llm.py .                                                       [100%]

============================== 1 passed in 2.12s ===============================

Troubleshooting

If tests fail due to authentication errors, verify your OPENAI_API_KEY environment variable is set correctly. For rate limits, add retries or reduce request frequency. If output varies, use regex or partial matching in assertions.

Key Takeaways

  • Use pytest to automate LLM output validation by asserting expected content.
  • Always load API keys from environment variables to keep credentials secure.
  • Write both synchronous and asynchronous test functions depending on your SDK usage.
  • Use flexible assertions like substring or regex to handle LLM output variability.
  • Handle API errors and rate limits gracefully in your test suite.
Verified 2026-04 · gpt-4o-mini
Verify ↗