How to build desktop automation agent
Quick answer
Use the
OpenAI SDK with the computer-use-2024-10-22 beta feature to build a desktop automation agent that can interact with your desktop environment. Send chat messages with tools parameter specifying the computer_20241022 tool and enable the beta flag to execute commands like taking screenshots or running scripts.PREREQUISITES
Python 3.8+OpenAI API key with Computer Use beta accesspip install openai>=1.0
Setup
Install the openai Python package (v1+) and set your OpenAI API key as an environment variable. The Computer Use feature requires enabling the computer-use-2024-10-22 beta flag in your API call.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to create a desktop automation agent that takes a screenshot using the OpenAI Computer Use API. It sends a chat completion request with the computer_20241022 tool and the required beta flag.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Take a screenshot of my desktop."}],
tools=[{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768
}],
betas=["computer-use-2024-10-22"]
)
print("Agent response:", response.choices[0].message.content) output
Agent response: Screenshot saved to /tmp/screenshot.png
Common variations
You can extend the agent to run shell commands, open applications, or read files by modifying the messages content and using the same tools and betas parameters. Async calls and streaming are not currently supported for Computer Use. Use gpt-4o-mini or other GPT-4o models for best results.
Troubleshooting
- If you get an authentication error, verify your
OPENAI_API_KEYenvironment variable is set and has Computer Use beta access. - If the agent does not respond or returns an error, ensure you include the
betas=["computer-use-2024-10-22"]parameter and thetoolsarray with the correctcomputer_20241022tool type. - Check your desktop environment permissions to allow screenshots or command execution.
Key Takeaways
- Use the OpenAI Python SDK with the computer-use-2024-10-22 beta to enable desktop automation.
- Include the tools parameter with type computer_20241022 and the betas flag in your chat completion call.
- You can automate screenshots, shell commands, and file operations via chat messages.
- Ensure your API key has Computer Use beta access and environment permissions are configured.
- Streaming and async are not supported for Computer Use; use synchronous calls.