Code Generation Cheat Sheet — LLM Patterns & Pitfalls
Use structured prompts and validation to turn natural language into production code.
Like a junior developer taking spec tickets: good specs (detailed prompts) + clear requirements (constraints) = usable PRs. Bad specs → broken code every time.
Key Concepts
Code Generation Patterns
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = """You are a Python expert. Generate a function following this pattern.
Examples:
def calculate_tax(amount: float) -> float:
"""Calculate 8% sales tax."""
return amount * 0.08
def apply_discount(price: float, rate: float) -> float:
"""Apply discount as percentage."""
return price * (1 - rate / 100)
Now write a function that:
- Takes a list of numbers
- Returns the sum after removing outliers (>2 std devs)
- Include docstring and type hints"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.3,
messages=[{"role": "user", "content": prompt}]
)
generated_code = response.choices[0].message.content
print(generated_code) import json
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
schema = {
"type": "object",
"properties": {
"function_name": {"type": "string"},
"parameters": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "type": {"type": "string"}}}
},
"body": {"type": "string", "description": "Python code body"}
}
}
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Generate a function to validate email addresses."}],
response_format={"type": "json_schema", "json_schema": {"name": "function_spec", "schema": schema}}
)
spec = json.loads(response.choices[0].message.content)
print(f"Function: {spec['function_name']}")
for param in spec['parameters']:
print(f" {param['name']}: {param['type']}") from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = """Write a function to find the longest increasing subsequence in an array.
Before writing code:
1. Explain the algorithm
2. Describe the data structure
3. Identify edge cases
4. Then write the implementation
Requirements:
- Time complexity: O(n log n) or better
- Handle empty arrays
- Return indices and values"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": prompt}]
)
output = response.choices[0].message.content
print(output) from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Read existing code as context
with open("db_utils.py") as f:
existing_code = f.read()
prompt = f"""Review this existing code and generate a new function in the same style:
```python
{existing_code}
```
Add a function to:
- Connect to Redis with timeout and retry logic
- Match the error handling pattern above
- Use the same logging setup
- Include docstrings like the existing functions"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": prompt}]
)
generated = response.choices[0].message.content
print(generated) import ast
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def generate_and_validate(spec: str, max_retries: int = 3) -> str:
for attempt in range(max_retries):
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": f"Generate Python code: {spec}"}]
)
code = response.choices[0].message.content
# Extract code block if wrapped
if "```python" in code:
code = code.split("```python")[1].split("```")[0]
# Validate syntax
try:
ast.parse(code)
return code
except SyntaxError as e:
if attempt < max_retries - 1:
print(f"Syntax error (attempt {attempt + 1}): {e}")
continue
else:
raise ValueError(f"Failed after {max_retries} attempts: {e}")
return code
result = generate_and_validate("Write a function to parse CSV with type inference")
print(result) Code Generation Comparison
| Pattern | Best For | Cost | Speed | Reliability |
|---|---|---|---|---|
| Few-shot examples | Simple utilities, style matching | Low | Fast | High (models copy examples) |
| Chain-of-thought | Complex algorithms, correctness critical | High | Slower (2x tokens) | Higher (explains reasoning) |
| Structured/JSON output | Parseable configs, API specs | Medium | Fast | Medium (format guaranteed, logic may fail) |
| Context-aware generation | Codebase-specific functions | High (large context) | Slower | High (understands dependencies) |
| Validation loop | Mission-critical code | Very high (retries) | Slowest | Very high (syntax-checked) |
Common Errors & Fixes
Hallucinated imports (ModuleNotFoundError) Cause: Model invents library names or versions that don't exist. Common with obscure domain packages.
Add to prompt: 'Only use: requests, pandas, numpy, sqlalchemy. No others.' Then validate imports with ast.parse() and static analysis before execution. Check sys.modules after import. Infinite loops or missing exit conditions Cause: Model generates loops without break/return conditions, especially in recursive functions.
Prompt explicitly: 'Include explicit exit conditions. Write base case first for recursion.' Then use timeout wrapper:
import signal
def timeout_handler(signum, frame):
raise TimeoutError('Code execution exceeded 5 seconds')
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(5)
try:
exec(generated_code)
finally:
signal.alarm(0) Off-by-one errors in indexing/loops Cause: Model confusion between 0-based and 1-based indexing, especially in SQL or NumPy code.
Include explicit test case in prompt:
'Test: array=[1,2,3], expected_result=[2,3]. Verify manually before returning code.'
Then validate:
test_array = [1, 2, 3]
expected = [2, 3]
result = eval(generated_code.strip())
assert result == expected, f'Got {result}, expected {expected}' Type mismatch or missing type hints Cause: Generated code ignores type hints from context or uses incompatible types.
Use mypy validation:
import subprocess
with open('generated.py', 'w') as f:
f.write(generated_code)
result = subprocess.run(['mypy', 'generated.py'], capture_output=True)
if result.returncode != 0:
print(f'Type errors: {result.stderr.decode()}') SQL injection or code injection vulnerabilities Cause: Model uses string concatenation instead of parameterized queries. Doesn't escape user input.
Explicit prompt: 'Use only parameterized queries with ? or %s placeholders. Never concatenate user input.'
Then validate with regex:
import re
if re.search(r"f['\"].*\{.*\}.*['\"].*sql", generated_code, re.IGNORECASE):
raise SecurityError('String interpolation in SQL detected') Production Gotchas
Passing 5000+ tokens of context causes models to lose focus. Generated code quality drops after 3000 tokens. Include only the most relevant 500–1000 tokens of examples/existing code. Prioritize type hints over full implementations.
temperature=0 (deterministic) causes repetitive, sometimes broken code. temperature>0.8 hallucinate wildly. Use 0.2–0.4 for correctness, 0.6–0.8 only for exploration. Never use temperature=1.0 for production code generation.
If few-shot examples are too specific, model returns them exactly instead of generalizing. Use abstract placeholders (FUNCTION_NAME, OPERATION, TYPE) and ask: 'Modify this pattern for X scenario' instead of 'Follow this example.'
Retry logic can blow budget. A 3-retry loop with 5000-token context costs 10× more than single generation. Use for critical paths only. Cache prompts or batch requests to reduce token waste.
Models generate plausible but incorrect usage examples in docstrings. These get copy-pasted and cause runtime errors. Always review docstring examples independently. Better: ask model to omit docstring examples or provide fixtures.
Models sometimes drop leading spaces or mix tabs/spaces, breaking Python. Always normalize: code.replace('\t', ' ') and run through autopep8 before execution.
Production-ready code generation pipeline with safety checks
import ast
import subprocess
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def generate_production_code(spec: str, context: str = "", test_cases: list = None) -> str:
"""
Generate code with comprehensive validation for production use.
Args:
spec: Requirements/description
context: Existing code to match style
test_cases: [(input, expected_output), ...]
Returns:
Validated, formatted code
"""
prompt = f"""Generate Python code for this requirement:
{spec}
Context (match this style):
{context if context else 'N/A'}
Requirements:
- Include type hints
- Use parameterized queries if SQL
- Include docstring
- Handle edge cases (None, empty, negative values)
- Use only standard library or: requests, pandas, numpy
- No try/except unless explicitly needed"""
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": prompt}]
)
code = response.choices[0].message.content
# Extract code block
if "```python" in code:
code = code.split("```python")[1].split("```")[0]
elif "```" in code:
code = code.split("```")[1].split("```")[0]
code = code.strip()
# Validate syntax
try:
ast.parse(code)
except SyntaxError as e:
raise ValueError(f"Invalid Python syntax: {e}")
# Check for dangerous patterns
if "exec(" in code or "eval(" in code or "__import__" in code:
raise SecurityError("Dangerous function detected")
# Format with autopep8
try:
result = subprocess.run(
["python", "-m", "autopep8", "--max-line-length=100"],
input=code,
capture_output=True,
text=True,
timeout=5
)
if result.returncode == 0:
code = result.stdout
except:
pass # autopep8 not installed, continue
# Run test cases if provided
if test_cases:
namespace = {}
try:
exec(code, namespace)
func_name = [name for name in namespace if not name.startswith('_')][0]
func = namespace[func_name]
for inputs, expected in test_cases:
result = func(*inputs) if isinstance(inputs, tuple) else func(inputs)
assert result == expected, f"Test failed: {inputs} → {result}, expected {expected}"
except Exception as e:
raise ValueError(f"Test execution failed: {e}")
return code
# Usage
generated = generate_production_code(
spec="Write a function to remove duplicates from a list while preserving order",
context="""def process_items(items):
\"\"\"Apply processing function.\"\"\"
return [x for x in items if x is not None]""",
test_cases=[([1, 2, 2, 3], [1, 2, 3]), ([5], [5])]
)
print("✓ Generated code passed all validations")
print(generated)