How to Intermediate · 3 min read

How to give AI agent code execution capability

Quick answer
To give an AI agent code execution capability, integrate a secure sandboxed environment where the agent can run generated code snippets dynamically. Use Python's exec() or subprocess carefully with strict input validation, or leverage specialized sandbox libraries to safely execute code triggered by the AI's output.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to authenticate requests.

bash
pip install openai

Step by step

This example shows how to create an AI agent that generates Python code and executes it safely using exec() inside a restricted namespace.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Prompt the AI to generate Python code
messages = [
    {"role": "user", "content": "Write Python code to calculate the factorial of 5."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

code_to_run = response.choices[0].message.content
print("Generated code:\n", code_to_run)

# Define a safe namespace for execution
safe_globals = {"__builtins__": {"range": range, "print": print}}
safe_locals = {}

try:
    exec(code_to_run, safe_globals, safe_locals)
except Exception as e:
    print(f"Error during code execution: {e}")
output
Generated code:
 def factorial(n):
     result = 1
     for i in range(1, n + 1):
         result *= i
     print(f"Factorial of {n} is {result}")

factorial(5)

Common variations

  • Use subprocess to run code in a separate process for better isolation.
  • Leverage sandbox libraries like RestrictedPython or containerized environments for enhanced security.
  • Implement async execution with asyncio for non-blocking code runs.
  • Use different LLM models like claude-3-5-sonnet-20241022 for code generation.

Troubleshooting

  • If you see SyntaxError or NameError, verify the generated code syntax and allowed built-ins.
  • For PermissionError, ensure your sandbox restricts unsafe operations.
  • If the AI generates incomplete code, prompt it to provide complete, runnable snippets.

Key Takeaways

  • Use sandboxed environments to safely execute AI-generated code and prevent security risks.
  • Validate and restrict built-in functions accessible during code execution to avoid unsafe operations.
  • Leverage Python's exec() with controlled namespaces or external sandbox tools for dynamic code execution.
  • Choose AI models specialized in code generation like claude-3-5-sonnet-20241022 for better accuracy.
  • Handle errors gracefully by catching exceptions during code execution and refining AI prompts.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022
Verify ↗