How to beginner · 3 min read

GitHub Actions for LLM testing

Quick answer
Use GitHub Actions to automate testing of your LLM integrations by running Python scripts that call the OpenAI API or other AI SDKs. Configure workflows to trigger on pushes or pull requests, enabling continuous validation of your AI models.

PREREQUISITES

  • Python 3.8+
  • GitHub repository with LLM integration code
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup GitHub Actions

Create a .github/workflows directory in your repo and add a YAML workflow file to define the CI pipeline. This workflow installs Python, sets environment variables, installs dependencies, and runs your LLM test script.

yaml
name: LLM Testing

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test-llm:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install openai
      - name: Run LLM test script
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: python tests/test_llm.py
output
Run LLM test script
Hello from LLM test: Response: Hello, world!

Test passed.

Step by step LLM test script

Write a Python script that calls the OpenAI SDK to verify your LLM integration. This example uses gpt-4o to generate a simple completion and asserts the response.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def test_llm_response():
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Say hello"}]
    )
    text = response.choices[0].message.content
    print(f"Hello from LLM test: Response: {text}")
    assert "hello" in text.lower(), "LLM response does not contain 'hello'"

if __name__ == "__main__":
    test_llm_response()
    print("\nTest passed.")
output
Hello from LLM test: Response: Hello! How can I assist you today?

Test passed.

Common variations

  • Use async Python with asyncio and await for non-blocking tests.
  • Test different models like claude-3-5-sonnet-20241022 by changing the model parameter.
  • Integrate streaming responses by enabling stream=True and processing chunks.
python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_test():
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Say hello asynchronously"}],
        stream=True
    )
    collected = []
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
        collected.append(delta)
    print()
    assert "hello" in "".join(collected).lower(), "Async LLM response missing 'hello'"

if __name__ == "__main__":
    asyncio.run(async_test())
output
Hello asynchronously!

Troubleshooting tips

  • If you see AuthenticationError, verify your OPENAI_API_KEY is set correctly in GitHub Secrets.
  • Timeouts may require increasing job timeout or retry logic in your test script.
  • Check model availability and update model names if deprecated.

Key Takeaways

  • Use GitHub Actions to automate LLM integration tests on every code change.
  • Write simple Python scripts with the OpenAI SDK to validate model responses.
  • Store API keys securely in GitHub Secrets and reference them in workflows.
  • Leverage async and streaming features for advanced LLM testing scenarios.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗