How to Intermediate · 4 min read

How to build a code execution agent with OpenAI

Quick answer
Build a code execution agent by combining OpenAI API calls to generate or modify code with Python's exec() or subprocess to run it. Use gpt-4o or claude-3-5-sonnet-20241022 to generate code snippets, then execute safely in a controlled environment.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0

Step by step

This example shows how to create a simple code execution agent that asks gpt-4o to generate Python code based on a prompt, then executes it safely and returns the output.

python
import os
from openai import OpenAI
import sys
import io

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def run_code_agent(prompt: str) -> str:
    # Step 1: Generate Python code from prompt
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Write Python code to: {prompt}"}]
    )
    code = response.choices[0].message.content

    # Step 2: Execute generated code safely capturing output
    old_stdout = sys.stdout
    redirected_output = sys.stdout = io.StringIO()
    try:
        exec(code, {})  # empty globals for safety
    except Exception as e:
        return f"Error during code execution: {e}"
    finally:
        sys.stdout = old_stdout

    output = redirected_output.getvalue()
    return f"Generated code:\n{code}\n\nExecution output:\n{output}"

if __name__ == "__main__":
    prompt = "calculate the factorial of 5 and print the result"
    result = run_code_agent(prompt)
    print(result)
output
Generated code:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5))

Execution output:
120

Common variations

  • Use async calls with OpenAI SDK for non-blocking execution.
  • Switch models to claude-3-5-sonnet-20241022 for improved code generation quality.
  • Use subprocess to run code in a separate process for better isolation.
  • Implement input sanitization and resource limits to secure code execution.

Troubleshooting

  • If you get API authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For execution errors, check the generated code for syntax issues or unsupported operations.
  • Timeouts may occur if code runs too long; consider adding execution time limits.

Key Takeaways

  • Use gpt-4o or claude-3-5-sonnet-20241022 to generate executable Python code from natural language prompts.
  • Execute generated code safely with exec() in a controlled environment capturing output.
  • Consider async calls and subprocess isolation for production-grade code execution agents.
  • Always secure your API key via environment variables and sanitize inputs to avoid security risks.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗