How to Intermediate · 4 min read

How to build a code execution agent with OpenAI

Q: How to build a code execution agent with OpenAI

Build a code execution agent by combining OpenAI API calls to generate or modify code with Python's exec() or subprocess to run it. Use gpt-4o or claude-3-5-sonnet-20241022 to generate code snippets, then execute safely in a controlled environment.

Quick answer

Build a code execution agent by combining OpenAI API calls to generate or modify code with Python's exec() or subprocess to run it. Use gpt-4o or claude-3-5-sonnet-20241022 to generate code snippets, then execute safely in a controlled environment.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

Step by step

This example shows how to create a simple code execution agent that asks gpt-4o to generate Python code based on a prompt, then executes it safely and returns the output.

python

import os
from openai import OpenAI
import sys
import io

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def run_code_agent(prompt: str) -> str:
    # Step 1: Generate Python code from prompt
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Write Python code to: {prompt}"}]
    )
    code = response.choices[0].message.content

    # Step 2: Execute generated code safely capturing output
    old_stdout = sys.stdout
    redirected_output = sys.stdout = io.StringIO()
    try:
        exec(code, {})  # empty globals for safety
    except Exception as e:
        return f"Error during code execution: {e}"
    finally:
        sys.stdout = old_stdout

    output = redirected_output.getvalue()
    return f"Generated code:\n{code}\n\nExecution output:\n{output}"

if __name__ == "__main__":
    prompt = "calculate the factorial of 5 and print the result"
    result = run_code_agent(prompt)
    print(result)

output

Generated code:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

Execution output:
120

Common variations

Use async calls with OpenAI SDK for non-blocking execution.
Switch models to claude-3-5-sonnet-20241022 for improved code generation quality.
Use subprocess to run code in a separate process for better isolation.
Implement input sanitization and resource limits to secure code execution.

Troubleshooting

If you get API authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
For execution errors, check the generated code for syntax issues or unsupported operations.
Timeouts may occur if code runs too long; consider adding execution time limits.

✅

Key Takeaways

Use gpt-4o or claude-3-5-sonnet-20241022 to generate executable Python code from natural language prompts.
Execute generated code safely with exec() in a controlled environment capturing output.
Consider async calls and subprocess isolation for production-grade code execution agents.
Always secure your API key via environment variables and sanitize inputs to avoid security risks.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗