How to Intermediate · 3 min read

How to automate GUI tasks with AI

Quick answer
Use the OpenAI SDK with the computer-use-2024-10-22 beta feature to automate GUI tasks. This lets AI control mouse, keyboard, and screen interactions programmatically via tools and betas parameters in chat completions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key with Computer Use beta access
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable. The Computer Use feature requires enabling the computer-use-2024-10-22 beta flag in your API call.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (50 kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to automate a simple GUI task: opening the Start menu on Windows by simulating a key press using the OpenAI Computer Use API.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Open the Start menu on my Windows PC."}],
    tools=[{
        "type": "computer_20241022",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080
    }],
    betas=["computer-use-2024-10-22"]
)

print(response.choices[0].message.content)
output
AI: Pressing the Windows key to open the Start menu.

Common variations

You can automate mouse clicks, type text, or take screenshots by instructing the AI accordingly. Use different models like gpt-4o for more complex tasks. Async calls and streaming are supported via the OpenAI SDK.

python
import asyncio
import os
from openai import OpenAI

async def automate_gui():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Click the Notepad icon on the desktop and type 'Hello World'."}],
        tools=[{
            "type": "computer_20241022",
            "name": "computer",
            "display_width_px": 1920,
            "display_height_px": 1080
        }],
        betas=["computer-use-2024-10-22"]
    )
    print(response.choices[0].message.content)

asyncio.run(automate_gui())
output
AI: Moving mouse to Notepad icon, clicking it, and typing 'Hello World'.

Troubleshooting

  • If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
  • If the AI does not respond with GUI actions, ensure you include betas=["computer-use-2024-10-22"] and the tools parameter with the correct computer_20241022 type.
  • For display resolution mismatches, adjust display_width_px and display_height_px to your screen size.

Key Takeaways

  • Use the OpenAI Computer Use beta with tools and betas parameters to automate GUI tasks.
  • Include accurate display resolution in the tools object for precise control.
  • Async and streaming calls are supported for responsive automation workflows.
Verified 2026-04 · gpt-4o-mini, gpt-4o
Verify ↗