How to beginner · 3 min read

How to use pytest for LLM testing

Q: How to use pytest for LLM testing

Use pytest to automate testing of LLM outputs by calling the model via its API client and asserting expected responses. Write test functions that invoke the LLM with prompts and verify the returned message.content matches your criteria.

Quick answer

Use pytest to automate testing of LLM outputs by calling the model via its API client and asserting expected responses. Write test functions that invoke the LLM with prompts and verify the returned message.content matches your criteria.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 pytest

Setup

Install pytest and the OpenAI SDK. Set your API key as an environment variable to authenticate requests.

Install packages: pip install openai pytest
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai pytest

output

Collecting openai
Collecting pytest
Successfully installed openai-1.x pytest-7.x

Step by step

Create a test file with functions that call the LLM using the OpenAI SDK and assert expected outputs. Use pytest to run these tests automatically.

python

import os
from openai import OpenAI

def test_llm_response_contains_keyword():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Say hello"}]
    )
    text = response.choices[0].message.content
    assert "hello" in text.lower()

if __name__ == "__main__":
    import pytest
    pytest.main([__file__])

output

============================= test session starts ==============================
collected 1 item

test_llm.py .                                                             [100%]

============================== 1 passed in 3.45s ===============================

Common variations

You can test asynchronously, use different models, or test streaming outputs.

Async tests with pytest-asyncio
Test with Anthropic or other SDKs by adapting client calls
Validate structured outputs or tool calls

python

import os
import pytest
from openai import OpenAI

@pytest.mark.asyncio
async def test_llm_async_response():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What is 2+2?"}]
    )
    text = response.choices[0].message.content
    assert "4" in text

output

============================= test session starts ==============================
collected 1 item

test_async_llm.py .                                                       [100%]

============================== 1 passed in 2.12s ===============================

Troubleshooting

If tests fail due to authentication errors, verify your OPENAI_API_KEY environment variable is set correctly. For rate limits, add retries or reduce request frequency. If output varies, use regex or partial matching in assertions.

✅

Key Takeaways

Use pytest to automate LLM output validation by asserting expected content.
Always load API keys from environment variables to keep credentials secure.
Write both synchronous and asynchronous test functions depending on your SDK usage.
Use flexible assertions like substring or regex to handle LLM output variability.
Handle API errors and rate limits gracefully in your test suite.

Verified 2026-04 · gpt-4o-mini

Verify ↗