How to Intermediate · 3 min read

Fix AI code generation hallucinations

Quick answer
To fix AI code generation hallucinations, use precise prompt engineering with explicit instructions and request code explanations or comments. Always verify generated code by running tests or using static analysis tools, and prefer reliable models like gpt-4o or claude-sonnet-4-5 for higher accuracy.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0

Step by step

This example shows how to prompt gpt-4o to generate code with explicit instructions to reduce hallucinations, then verify the output by running a simple test.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Write a Python function named 'add' that takes two integers and returns their sum. "
    "Include type hints and a docstring. Then provide a simple usage example as a comment."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

code = response.choices[0].message.content
print("Generated code:\n", code)

# Example verification by executing the generated code safely
exec_globals = {}
exec(code, exec_globals)

# Test the function
assert exec_globals['add'](2, 3) == 5
print("Test passed: add(2, 3) == 5")
output
Generated code:
 def add(a: int, b: int) -> int:
    """Return the sum of two integers."""
    return a + b

# Example usage:
# result = add(2, 3)
# print(result)  # Output: 5
Test passed: add(2, 3) == 5

Common variations

You can use asynchronous calls or streaming to handle large code generation tasks. Switching to claude-sonnet-4-5 can improve accuracy for complex code. Always include explicit instructions and request explanations to reduce hallucinations.

python
import os
import asyncio
from openai import OpenAI

async def async_generate_code():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = (
        "Write a Python function 'factorial' with type hints and a docstring. "
        "Explain the logic in comments."
    )
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

if __name__ == "__main__":
    asyncio.run(async_generate_code())
output
def factorial(n: int) -> int:
    """Calculate the factorial of a non-negative integer n."""
    # Base case: factorial of 0 is 1
    if n == 0:
        return 1
    # Recursive case
    return n * factorial(n - 1)

Troubleshooting

  • If generated code contains syntax errors, add explicit instructions to include type hints and comments.
  • If hallucinations persist, switch to more reliable models like claude-sonnet-4-5 or gpt-4o.
  • Use static analysis tools (e.g., mypy, flake8) and unit tests to validate code correctness.
  • Limit generation length to avoid incomplete code snippets.

Key Takeaways

  • Use explicit, detailed prompts with type hints and comments to reduce hallucinations.
  • Verify generated code by running tests or static analysis before deployment.
  • Prefer reliable models like gpt-4o or claude-sonnet-4-5 for accurate code generation.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-sonnet-4-5
Verify ↗