How to Intermediate · 3 min read

How to automate GUI tasks with AI

Q: How to automate GUI tasks with AI

Use the OpenAI SDK with the computer-use-2024-10-22 beta feature to automate GUI tasks. This lets AI control mouse, keyboard, and screen interactions programmatically via tools and betas parameters in chat completions.

Quick answer

Use the OpenAI SDK with the computer-use-2024-10-22 beta feature to automate GUI tasks. This lets AI control mouse, keyboard, and screen interactions programmatically via tools and betas parameters in chat completions.

PREREQUISITES

Python 3.8+
OpenAI API key with Computer Use beta access
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable. The Computer Use feature requires enabling the computer-use-2024-10-22 beta flag in your API call.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (50 kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to automate a simple GUI task: opening the Start menu on Windows by simulating a key press using the OpenAI Computer Use API.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Open the Start menu on my Windows PC."}],
    tools=[{
        "type": "computer_20241022",
        "name": "computer",
        "display_width_px": 1920,
        "display_height_px": 1080
    }],
    betas=["computer-use-2024-10-22"]
)

print(response.choices[0].message.content)

output

AI: Pressing the Windows key to open the Start menu.

Common variations

You can automate mouse clicks, type text, or take screenshots by instructing the AI accordingly. Use different models like gpt-4o for more complex tasks. Async calls and streaming are supported via the OpenAI SDK.

python

import asyncio
import os
from openai import OpenAI

async def automate_gui():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Click the Notepad icon on the desktop and type 'Hello World'."}],
        tools=[{
            "type": "computer_20241022",
            "name": "computer",
            "display_width_px": 1920,
            "display_height_px": 1080
        }],
        betas=["computer-use-2024-10-22"]
    )
    print(response.choices[0].message.content)

asyncio.run(automate_gui())

output

AI: Moving mouse to Notepad icon, clicking it, and typing 'Hello World'.

Troubleshooting

If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
If the AI does not respond with GUI actions, ensure you include betas=["computer-use-2024-10-22"] and the tools parameter with the correct computer_20241022 type.
For display resolution mismatches, adjust display_width_px and display_height_px to your screen size.

✅

Key Takeaways

Use the OpenAI Computer Use beta with tools and betas parameters to automate GUI tasks.
Include accurate display resolution in the tools object for precise control.
Async and streaming calls are supported for responsive automation workflows.

Verified 2026-04 · gpt-4o-mini, gpt-4o

Verify ↗