How to run browser automation with AI agent
Quick answer
Use the
browser-use Python package with an OpenAI-based LLM like gpt-4o to run browser automation tasks. Instantiate an Agent with your task and LLM, then call await agent.run() to execute browser actions programmatically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install browser-usePlaywright installed with chromium (playwright install chromium)
Setup
Install the required packages and set your environment variable for the OpenAI API key. Also, install Playwright's Chromium browser for automation.
pip install openai browser-use
playwright install chromium output
Requirement already satisfied: openai in ... Requirement already satisfied: browser-use in ... [Playwright] Downloading Chromium... [Playwright] Chromium downloaded successfully.
Step by step
This example shows how to create an AI agent that automates a browser task: opening Google and searching for 'AI news'. It uses browser-use with langchain_openai.ChatOpenAI as the LLM backend.
import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
llm = ChatOpenAI(model="gpt-4o", openai_api_key=os.environ["OPENAI_API_KEY"])
agent = Agent(task="Go to google.com and search for 'AI news'", llm=llm)
result = await agent.run()
print("Agent output:", result)
if __name__ == "__main__":
asyncio.run(main()) output
Agent output: Browsed to google.com and searched for 'AI news'. Extracted top results: ...
Common variations
- Use synchronous code by running the async function with
asyncio.run(). - Switch LLM model by changing
modelinChatOpenAI(e.g.,gpt-4o-mini). - Use OpenAI SDK directly by wrapping calls if you prefer not to use LangChain.
import os
import asyncio
from browser_use import Agent
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Wrap OpenAI client in a simple LLM interface for browser-use
class SimpleLLM:
def __init__(self, client):
self.client = client
async def __call__(self, prompt):
response = await self.client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
llm = SimpleLLM(client)
agent = Agent(task="Search for latest AI breakthroughs on google.com", llm=llm)
result = await agent.run()
print("Agent output:", result)
if __name__ == "__main__":
asyncio.run(main()) output
Agent output: Navigated to google.com, searched 'latest AI breakthroughs', and summarized top results: ...
Troubleshooting
- If you see
playwright not installederrors, runplaywright install chromiumto install the browser. - If the agent hangs, ensure your OpenAI API key is set correctly in
OPENAI_API_KEYenvironment variable. - For permission errors, run your script with appropriate user privileges or virtual environment.
Key Takeaways
- Use the
browser-usepackage with an LLM likegpt-4ofor browser automation. - Set up Playwright and Chromium with
playwright install chromiumbefore running. - Run the agent asynchronously with
await agent.run()to execute tasks. - You can swap LLMs or use OpenAI SDK directly for flexibility.
- Check environment variables and Playwright installation if you encounter errors.