How to Intermediate · 4 min read

How to run browser automation with AI agent

Q: How to run browser automation with AI agent

Use the browser-use Python package with an OpenAI-based LLM like gpt-4o to run browser automation tasks. Instantiate an Agent with your task and LLM, then call await agent.run() to execute browser actions programmatically.

Quick answer

Use the browser-use Python package with an OpenAI-based LLM like gpt-4o to run browser automation tasks. Instantiate an Agent with your task and LLM, then call await agent.run() to execute browser actions programmatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install browser-use
Playwright installed with chromium (playwright install chromium)

Setup

Install the required packages and set your environment variable for the OpenAI API key. Also, install Playwright's Chromium browser for automation.

bash

pip install openai browser-use
playwright install chromium

output

Requirement already satisfied: openai in ...
Requirement already satisfied: browser-use in ...
[Playwright] Downloading Chromium...
[Playwright] Chromium downloaded successfully.

Step by step

This example shows how to create an AI agent that automates a browser task: opening Google and searching for 'AI news'. It uses browser-use with langchain_openai.ChatOpenAI as the LLM backend.

python

import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    llm = ChatOpenAI(model="gpt-4o", openai_api_key=os.environ["OPENAI_API_KEY"])
    agent = Agent(task="Go to google.com and search for 'AI news'", llm=llm)
    result = await agent.run()
    print("Agent output:", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Agent output: Browsed to google.com and searched for 'AI news'. Extracted top results: ...

Common variations

Use synchronous code by running the async function with asyncio.run().
Switch LLM model by changing model in ChatOpenAI (e.g., gpt-4o-mini).
Use OpenAI SDK directly by wrapping calls if you prefer not to use LangChain.

python

import os
import asyncio
from browser_use import Agent
from openai import OpenAI

async def main():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    # Wrap OpenAI client in a simple LLM interface for browser-use
    class SimpleLLM:
        def __init__(self, client):
            self.client = client
        async def __call__(self, prompt):
            response = await self.client.chat.completions.acreate(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content

    llm = SimpleLLM(client)
    agent = Agent(task="Search for latest AI breakthroughs on google.com", llm=llm)
    result = await agent.run()
    print("Agent output:", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Agent output: Navigated to google.com, searched 'latest AI breakthroughs', and summarized top results: ...

Troubleshooting

If you see playwright not installed errors, run playwright install chromium to install the browser.
If the agent hangs, ensure your OpenAI API key is set correctly in OPENAI_API_KEY environment variable.
For permission errors, run your script with appropriate user privileges or virtual environment.

✅

Key Takeaways

Use the browser-use package with an LLM like gpt-4o for browser automation.
Set up Playwright and Chromium with playwright install chromium before running.
Run the agent asynchronously with await agent.run() to execute tasks.
You can swap LLMs or use OpenAI SDK directly for flexibility.
Check environment variables and Playwright installation if you encounter errors.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗