How to beginner · 3 min read

Browser Use task definition explained

Quick answer
A Browser Use task defines an AI agent's goal to interact with web pages using browser automation controlled by an LLM. Use the browser-use Python package to specify the task, provide the LLM model, and run the agent asynchronously to perform browsing actions and return results.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install browser-use>=0.1.0
  • pip install playwright
  • playwright install chromium

Setup

Install the browser-use package and playwright for browser automation. Set your OpenAI API key as an environment variable. Install Chromium for Playwright to control.

  • pip install browser-use playwright
  • playwright install chromium
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install browser-use playwright
playwright install chromium
output
Collecting browser-use
Collecting playwright
...
Successfully installed browser-use playwright
[Playwright] Chromium installed successfully

Step by step

Define a browsing task with a natural language goal and run it using the Agent class from browser_use. The agent uses an LLM (e.g., gpt-4o) to decide browser actions and returns the final result.

python
import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    # Initialize the LLM with OpenAI API key
    llm = ChatOpenAI(model="gpt-4o", openai_api_key=os.environ["OPENAI_API_KEY"])

    # Define the browsing task
    agent = Agent(
        task="Go to google.com and search for 'AI news'",
        llm=llm
    )

    # Run the agent asynchronously
    result = await agent.run()
    print("Browsing result:", result)

if __name__ == "__main__":
    asyncio.run(main())
output
Browsing result: Here are the latest AI news headlines from Google search...

Common variations

You can use different LLMs compatible with LangChain, such as gpt-4o-mini or Anthropic's claude-3-5-sonnet-20241022 via their SDKs. The Agent supports synchronous and asynchronous usage. For streaming partial results, integrate with async event loops. You can customize the browsing task prompt to specify detailed instructions or multi-step workflows.

python
import os
import asyncio
from anthropic import Anthropic
from browser_use import Agent

async def main():
    # Using Anthropic Claude as LLM
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

    # Wrap Claude in a LangChain-compatible interface or custom wrapper
    # For simplicity, assume a compatible llm object

    agent = Agent(
        task="Visit example.com and extract the main headline",
        llm=client  # Replace with proper wrapper if needed
    )

    result = await agent.run()
    print("Browsing result:", result)

if __name__ == "__main__":
    asyncio.run(main())
output
Browsing result: The main headline on example.com is 'Welcome to Example Domain'.

Troubleshooting

  • If you see playwright not installed errors, run playwright install chromium to install the browser.
  • If the agent hangs or returns empty results, verify your OpenAI API key is set correctly in OPENAI_API_KEY.
  • For network issues, ensure your environment allows outbound HTTPS requests to OpenAI and the target websites.
  • Use asyncio.run() to run async code properly in Python 3.8+.

Key Takeaways

  • Use the browser-use Python package with Playwright to automate browser tasks driven by LLMs.
  • Define your browsing goal as a natural language task string when creating the Agent.
  • Run the agent asynchronously with await agent.run() to get browsing results.
  • Ensure Playwright browsers are installed and your OpenAI API key is set in environment variables.
  • You can swap LLMs by passing any LangChain-compatible or custom LLM client to the Agent.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗