How to Intermediate · 4 min read

How to write unit tests for LLM apps

Quick answer
Write unit tests for LLM apps by mocking the OpenAI client calls to avoid real API usage, then assert expected outputs or behaviors. Use Python testing frameworks like unittest or pytest with libraries such as unittest.mock to simulate responses and test your app logic.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install pytest

Setup

Install the openai Python SDK and a testing framework like pytest. Set your API key as an environment variable to keep it secure.

  • Install OpenAI SDK: pip install openai
  • Install pytest: pip install pytest
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai pytest
output
Collecting openai
Collecting pytest
Successfully installed openai pytest

Step by step

Create a Python module that calls the LLM using the OpenAI SDK v1 pattern. Then write a unit test that mocks the client.chat.completions.create method to simulate a response without making a real API call.

python
import os
from openai import OpenAI

def generate_response(prompt: str) -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

# Unit test with pytest and unittest.mock
import pytest
from unittest.mock import patch, MagicMock

def test_generate_response():
    mock_response = MagicMock()
    mock_response.choices = [MagicMock()]
    mock_response.choices[0].message.content = "Hello, this is a mocked response."

    with patch("openai.OpenAI.chat.completions.create", return_value=mock_response) as mock_create:
        result = generate_response("Say hello")
        mock_create.assert_called_once_with(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Say hello"}]
        )
        assert result == "Hello, this is a mocked response."

if __name__ == "__main__":
    pytest.main([__file__])
output
============================= test session starts ==============================
collected 1 item

test_llm.py .                                                            [100%]

============================== 1 passed in 0.12s ===============================

Common variations

You can write async unit tests by mocking async methods if your app uses async calls. For streaming responses, mock the async iterator yielding chunks. You can also test with different models by parameterizing your test inputs.

python
import asyncio
import os
from openai import OpenAI

async def generate_response_async(prompt: str) -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

import pytest
from unittest.mock import AsyncMock, patch

@pytest.mark.asyncio
async def test_generate_response_async():
    mock_response = AsyncMock()
    mock_response.choices = [AsyncMock()]
    mock_response.choices[0].message.content = "Async mocked response."

    with patch("openai.OpenAI.chat.completions.create", new_callable=AsyncMock) as mock_create:
        mock_create.return_value = mock_response
        result = await generate_response_async("Hello async")
        mock_create.assert_awaited_once_with(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Hello async"}]
        )
        assert result == "Async mocked response."
output
============================= test session starts ==============================
collected 1 item

test_llm_async.py .                                                      [100%]

============================== 1 passed in 0.15s ===============================

Troubleshooting

  • If your tests fail due to missing environment variables, ensure OPENAI_API_KEY is set in your shell or CI environment.
  • If mocking fails with attribute errors, verify you patch the correct import path matching your module structure.
  • For async tests, use pytest-asyncio and mark tests with @pytest.mark.asyncio.

Key Takeaways

  • Always mock LLM API calls in unit tests to avoid real API usage and costs.
  • Use unittest.mock or pytest-mock to simulate responses and assert expected outputs.
  • Write async tests with pytest-asyncio when your app uses async LLM calls.
  • Patch the exact method path where the client is used to ensure mocks work correctly.
  • Set environment variables securely and verify them in your test environment.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗