Cheat Sheet intermediate · 8 min read

Code Generation Cheat Sheet — LLM Patterns & Pitfalls

version 2026-Q1

Generate, test, and deploy code with LLMs safely

Mental model

Use structured prompts and validation to turn natural language into production code.

Like a junior developer taking spec tickets: good specs (detailed prompts) + clear requirements (constraints) = usable PRs. Bad specs → broken code every time.

Key Concepts

Few-shot prompting

Including 2-5 working code examples in the prompt to establish the pattern and style the model should follow.

Chain-of-thought for code

Asking the model to explain its logic step-by-step before writing code, reducing hallucination and improving correctness.

Constrained generation

Using structured formats (JSON schema, docstring templates) to force the model output into machine-readable, validatable forms.

Code context window

Including existing codebase files, function signatures, and type hints so the model understands dependencies and style.

Temperature tuning

Lower values (0.2–0.4) for deterministic code; higher (0.7–1.0) for creative solutions. Avoid >1.0 for production code.

Syntax validation

Running generated code through a parser or linter before execution to catch immediate syntax errors without running.

Code Generation Patterns

01 Few-shot function generation

Writing utility functions matching existing patterns

python

from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = """You are a Python expert. Generate a function following this pattern.

Examples:
def calculate_tax(amount: float) -> float:
    """Calculate 8% sales tax."""
    return amount * 0.08

def apply_discount(price: float, rate: float) -> float:
    """Apply discount as percentage."""
    return price * (1 - rate / 100)

Now write a function that:
- Takes a list of numbers
- Returns the sum after removing outliers (>2 std devs)
- Include docstring and type hints"""

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.3,
    messages=[{"role": "user", "content": prompt}]
)

generated_code = response.choices[0].message.content
print(generated_code)

Model may ignore examples if prompt is too long. Keep examples <200 tokens. Never mix Python 2 and 3 in examples.

02 Structured code generation with JSON schema

Need parseable, validated code output (class definitions, configs)

python

import json
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

schema = {
    "type": "object",
    "properties": {
        "function_name": {"type": "string"},
        "parameters": {
            "type": "array",
            "items": {"type": "object", "properties": {"name": {"type": "string"}, "type": {"type": "string"}}}
        },
        "body": {"type": "string", "description": "Python code body"}
    }
}

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Generate a function to validate email addresses."}],
    response_format={"type": "json_schema", "json_schema": {"name": "function_spec", "schema": schema}}
)

spec = json.loads(response.choices[0].message.content)
print(f"Function: {spec['function_name']}")
for param in spec['parameters']:
    print(f"  {param['name']}: {param['type']}")

JSON mode doesn't guarantee syntactically valid code. Always parse and validate separately. Missing closing braces are common.

03 Chain-of-thought before code

Complex logic, algorithms, or high-stakes code

python

from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = """Write a function to find the longest increasing subsequence in an array.

Before writing code:
1. Explain the algorithm
2. Describe the data structure
3. Identify edge cases
4. Then write the implementation

Requirements:
- Time complexity: O(n log n) or better
- Handle empty arrays
- Return indices and values"""

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.2,
    messages=[{"role": "user", "content": prompt}]
)

output = response.choices[0].message.content
print(output)

Longer response = higher cost. Use for features only, not every 5-line helper. Model may hallucinate complexity: always test.

04 Generate code matching existing codebase

Adding functions to existing projects, maintaining style consistency

python

from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Read existing code as context
with open("db_utils.py") as f:
    existing_code = f.read()

prompt = f"""Review this existing code and generate a new function in the same style:

```python
{existing_code}
```

Add a function to:
- Connect to Redis with timeout and retry logic
- Match the error handling pattern above
- Use the same logging setup
- Include docstrings like the existing functions"""

response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.2,
    messages=[{"role": "user", "content": prompt}]
)

generated = response.choices[0].message.content
print(generated)

Token limit hit on large codebases. Include only relevant 500-1000 token excerpts. Never send entire files.

05 Generate + validate + fix loop

Mission-critical code or untrusted models

python

import ast
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_and_validate(spec: str, max_retries: int = 3) -> str:
    for attempt in range(max_retries):
        response = client.chat.completions.create(
            model="gpt-4o",
            temperature=0.2,
            messages=[{"role": "user", "content": f"Generate Python code: {spec}"}]
        )
        
        code = response.choices[0].message.content
        
        # Extract code block if wrapped
        if "```python" in code:
            code = code.split("```python")[1].split("```")[0]
        
        # Validate syntax
        try:
            ast.parse(code)
            return code
        except SyntaxError as e:
            if attempt < max_retries - 1:
                print(f"Syntax error (attempt {attempt + 1}): {e}")
                continue
            else:
                raise ValueError(f"Failed after {max_retries} attempts: {e}")
    
    return code

result = generate_and_validate("Write a function to parse CSV with type inference")
print(result)

AST validation only checks syntax, not logic. Hallucinated imports will pass. Always test with real data before deploy.

Code Generation Comparison

Pattern	Best For	Cost	Speed	Reliability
Few-shot examples	Simple utilities, style matching	Low	Fast	High (models copy examples)
Chain-of-thought	Complex algorithms, correctness critical	High	Slower (2x tokens)	Higher (explains reasoning)
Structured/JSON output	Parseable configs, API specs	Medium	Fast	Medium (format guaranteed, logic may fail)
Context-aware generation	Codebase-specific functions	High (large context)	Slower	High (understands dependencies)
Validation loop	Mission-critical code	Very high (retries)	Slowest	Very high (syntax-checked)

Common Errors & Fixes

01 Hallucinated imports (ModuleNotFoundError)

Cause: Model invents library names or versions that don't exist. Common with obscure domain packages.

Fix:

python

Add to prompt: 'Only use: requests, pandas, numpy, sqlalchemy. No others.' Then validate imports with ast.parse() and static analysis before execution. Check sys.modules after import.

02 Infinite loops or missing exit conditions

Cause: Model generates loops without break/return conditions, especially in recursive functions.

Fix:

python

Prompt explicitly: 'Include explicit exit conditions. Write base case first for recursion.' Then use timeout wrapper:

import signal
def timeout_handler(signum, frame):
    raise TimeoutError('Code execution exceeded 5 seconds')

signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(5)
try:
    exec(generated_code)
finally:
    signal.alarm(0)

03 Off-by-one errors in indexing/loops

Cause: Model confusion between 0-based and 1-based indexing, especially in SQL or NumPy code.

Fix:

python

Include explicit test case in prompt:
'Test: array=[1,2,3], expected_result=[2,3]. Verify manually before returning code.'

Then validate:
test_array = [1, 2, 3]
expected = [2, 3]
result = eval(generated_code.strip())
assert result == expected, f'Got {result}, expected {expected}'

04 Type mismatch or missing type hints

Cause: Generated code ignores type hints from context or uses incompatible types.

Fix:

python

Use mypy validation:
import subprocess
with open('generated.py', 'w') as f:
    f.write(generated_code)
result = subprocess.run(['mypy', 'generated.py'], capture_output=True)
if result.returncode != 0:
    print(f'Type errors: {result.stderr.decode()}')

05 SQL injection or code injection vulnerabilities

Cause: Model uses string concatenation instead of parameterized queries. Doesn't escape user input.

Fix:

python

Explicit prompt: 'Use only parameterized queries with ? or %s placeholders. Never concatenate user input.'

Then validate with regex:
import re
if re.search(r"f['\"].*\{.*\}.*['\"].*sql", generated_code, re.IGNORECASE):
    raise SecurityError('String interpolation in SQL detected')

Production Gotchas

⚠ Model degradation with long context

Passing 5000+ tokens of context causes models to lose focus. Generated code quality drops after 3000 tokens. Include only the most relevant 500–1000 tokens of examples/existing code. Prioritize type hints over full implementations.

⚠ Temperature extremes break code generation

temperature=0 (deterministic) causes repetitive, sometimes broken code. temperature>0.8 hallucinate wildly. Use 0.2–0.4 for correctness, 0.6–0.8 only for exploration. Never use temperature=1.0 for production code generation.

⚠ Model copying examples verbatim

If few-shot examples are too specific, model returns them exactly instead of generalizing. Use abstract placeholders (FUNCTION_NAME, OPERATION, TYPE) and ask: 'Modify this pattern for X scenario' instead of 'Follow this example.'

⚠ Cost explosion with validation loops

Retry logic can blow budget. A 3-retry loop with 5000-token context costs 10× more than single generation. Use for critical paths only. Cache prompts or batch requests to reduce token waste.

⚠ Docstrings containing hallucinated examples

Models generate plausible but incorrect usage examples in docstrings. These get copy-pasted and cause runtime errors. Always review docstring examples independently. Better: ask model to omit docstring examples or provide fixtures.

⚠ Indentation and whitespace matter more than you think

Models sometimes drop leading spaces or mix tabs/spaces, breaking Python. Always normalize: code.replace('\t', ' ') and run through autopep8 before execution.

Production-ready code generation pipeline with safety checks

python

import ast
import subprocess
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def generate_production_code(spec: str, context: str = "", test_cases: list = None) -> str:
    """
    Generate code with comprehensive validation for production use.
    
    Args:
        spec: Requirements/description
        context: Existing code to match style
        test_cases: [(input, expected_output), ...]
    
    Returns:
        Validated, formatted code
    """
    prompt = f"""Generate Python code for this requirement:
{spec}

Context (match this style):
{context if context else 'N/A'}

Requirements:
- Include type hints
- Use parameterized queries if SQL
- Include docstring
- Handle edge cases (None, empty, negative values)
- Use only standard library or: requests, pandas, numpy
- No try/except unless explicitly needed"""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.2,
        messages=[{"role": "user", "content": prompt}]
    )
    
    code = response.choices[0].message.content
    
    # Extract code block
    if "```python" in code:
        code = code.split("```python")[1].split("```")[0]
    elif "```" in code:
        code = code.split("```")[1].split("```")[0]
    
    code = code.strip()
    
    # Validate syntax
    try:
        ast.parse(code)
    except SyntaxError as e:
        raise ValueError(f"Invalid Python syntax: {e}")
    
    # Check for dangerous patterns
    if "exec(" in code or "eval(" in code or "__import__" in code:
        raise SecurityError("Dangerous function detected")
    
    # Format with autopep8
    try:
        result = subprocess.run(
            ["python", "-m", "autopep8", "--max-line-length=100"],
            input=code,
            capture_output=True,
            text=True,
            timeout=5
        )
        if result.returncode == 0:
            code = result.stdout
    except:
        pass  # autopep8 not installed, continue
    
    # Run test cases if provided
    if test_cases:
        namespace = {}
        try:
            exec(code, namespace)
            func_name = [name for name in namespace if not name.startswith('_')][0]
            func = namespace[func_name]
            
            for inputs, expected in test_cases:
                result = func(*inputs) if isinstance(inputs, tuple) else func(inputs)
                assert result == expected, f"Test failed: {inputs} → {result}, expected {expected}"
        except Exception as e:
            raise ValueError(f"Test execution failed: {e}")
    
    return code

# Usage
generated = generate_production_code(
    spec="Write a function to remove duplicates from a list while preserving order",
    context="""def process_items(items):
    \"\"\"Apply processing function.\"\"\" 
    return [x for x in items if x is not None]""",
    test_cases=[([1, 2, 2, 3], [1, 2, 3]), ([5], [5])]
)

print("✓ Generated code passed all validations")
print(generated)

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.