How to Intermediate · 3 min read

How to verify AI-generated code

Quick answer
To verify AI-generated code, use automated testing frameworks like pytest to run unit and integration tests, apply static analysis tools such as pylint or mypy for code quality and type checking, and execute the code in isolated sandboxes like Docker or e2b-code-interpreter to safely observe behavior. Combining these methods ensures correctness, security, and maintainability of AI-generated code.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install pytest pylint mypy e2b-code-interpreter

Setup verification tools

Install essential tools for code verification: pytest for testing, pylint and mypy for static analysis, and e2b-code-interpreter for sandboxed execution. Set environment variables for API keys if you plan to regenerate or validate code with AI assistance.

bash
pip install pytest pylint mypy e2b-code-interpreter

Step by step verification

Run tests on AI-generated code, analyze it statically, and execute safely in a sandbox.

python
import os
from e2b_code_interpreter import Sandbox

# Example AI-generated code snippet
code = '''
def add(a, b):
    return a + b
'''

# Write code to a file
with open('generated_code.py', 'w') as f:
    f.write(code)

# 1. Run pytest tests
with open('test_generated_code.py', 'w') as f:
    f.write('''
from generated_code import add

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
''')

import subprocess
print('Running pytest...')
pytest_result = subprocess.run(['pytest', 'test_generated_code.py', '--maxfail=1', '--disable-warnings'], capture_output=True, text=True)
print(pytest_result.stdout)

# 2. Static analysis with pylint
print('Running pylint...')
pylint_result = subprocess.run(['pylint', 'generated_code.py', '--disable=all', '--enable=E,F'], capture_output=True, text=True)
print(pylint_result.stdout or 'No errors found')

# 3. Type checking with mypy
print('Running mypy...')
mypy_result = subprocess.run(['mypy', 'generated_code.py'], capture_output=True, text=True)
print(mypy_result.stdout or 'No type errors')

# 4. Safe execution in sandbox
sandbox = Sandbox(api_key=os.environ['E2B_API_KEY'])
execution = sandbox.run_code(code + "\nprint('Add result:', add(2, 3))")
print('Sandbox output:', execution.text)
sandbox.close()
output
Running pytest...
============================= test session starts ==============================
collected 1 item

test_generated_code.py .                                                  [100%]

============================== 1 passed in 0.02s ===============================

Running pylint...
No errors found

Running mypy...
Success: no issues found in 1 source file

Sandbox output: Add result: 5

Common variations

You can verify AI-generated code asynchronously using pytest-asyncio for async tests. For streaming code generation, integrate verification in your pipeline to test partial outputs. Use different LLM models like gpt-4o or claude-sonnet-4-5 to generate code with embedded test cases. Static analysis tools vary by language; for TypeScript use tsc and eslint.

python
import asyncio
import pytest

@pytest.mark.asyncio
def test_async_add():
    async def async_add(a, b):
        return a + b
    result = asyncio.run(async_add(1, 2))
    assert result == 3

print('Running async pytest...')
pytest.main(['-v', '-s', '--maxfail=1', '--disable-warnings'])
output
Running async pytest...
============================= test session starts ==============================
collected 1 item

test_async_code.py::test_async_add PASSED                                [100%]

============================== 1 passed in 0.01s ===============================

Troubleshooting verification

  • If pytest fails, review test assertions and AI code logic.
  • If pylint reports errors, fix syntax or style issues before deployment.
  • If sandbox execution hangs or errors, check resource limits or API key validity.
  • For type errors from mypy, add type annotations or adjust AI prompts to generate typed code.

Key Takeaways

  • Always run automated tests like pytest on AI-generated code before use.
  • Use static analysis tools such as pylint and mypy to catch errors early.
  • Execute code in isolated sandboxes like e2b-code-interpreter to ensure security.
  • Incorporate verification steps into your AI code generation pipeline for continuous quality.
  • Adjust AI prompts to generate testable and type-annotated code for easier verification.
Verified 2026-04 · gpt-4o, claude-sonnet-4-5
Verify ↗