How to beginner · 3 min read

How to build web scraping agent with Browser Use

Q: How to build web scraping agent with Browser Use

Use the browser-use Python package with an OpenAI-based LLM like gpt-4o to build a web scraping agent by defining a task and running the agent asynchronously. The agent automates browser actions such as navigation and data extraction using playwright under the hood.

Quick answer

Use the browser-use Python package with an OpenAI-based LLM like gpt-4o to build a web scraping agent by defining a task and running the agent asynchronously. The agent automates browser actions such as navigation and data extraction using playwright under the hood.

PREREQUISITES

Python 3.8+
OpenAI API key
pip install openai>=1.0
pip install browser-use
pip install playwright
playwright install chromium

Setup

Install the required packages and set environment variables. browser-use requires playwright for browser automation and an OpenAI API key for the LLM.

bash

pip install openai browser-use playwright
playwright install chromium

output

Collecting openai
Collecting browser-use
Collecting playwright
...
[Playwright] Chromium is installed successfully.

Step by step

Create a Python script that defines a web scraping task using browser_use.Agent with an OpenAI LLM. The agent will open a webpage, extract data, and return the result.

python

import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    # Initialize the LLM with OpenAI GPT-4o
    llm = ChatOpenAI(model="gpt-4o", temperature=0, openai_api_key=os.environ["OPENAI_API_KEY"])

    # Define the scraping task
    task = "Go to https://quotes.toscrape.com and extract the first 5 quotes with authors"

    # Create the agent with the task and LLM
    agent = Agent(task=task, llm=llm)

    # Run the agent asynchronously
    result = await agent.run()

    print("Scraping result:\n", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Scraping result:
 "1. “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” — Albert Einstein\n2. “It is our choices, Harry, that show what we truly are, far more than our abilities.” — J.K. Rowling\n3. “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.” — Albert Einstein\n4. “The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.” — Jane Austen\n5. “Imperfection is beauty, madness is genius and it’s better to be absolutely ridiculous than absolutely boring.” — Marilyn Monroe"

Common variations

You can run the agent synchronously using asyncio.run() or integrate streaming output by customizing the LLM. You may also use different OpenAI models like gpt-4o-mini or Anthropic Claude models by swapping the LLM implementation.

python

import os
import asyncio
from browser_use import Agent
from openai import OpenAI

async def main():
    # Using OpenAI SDK client directly
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    # Wrap client in a simple LLM interface for browser_use
    class SimpleLLM:
        def __init__(self, client):
            self.client = client
        async def __call__(self, prompt):
            response = await self.client.chat.completions.acreate(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.choices[0].message.content

    llm = SimpleLLM(client)

    task = "Go to https://quotes.toscrape.com and extract the first 3 authors"
    agent = Agent(task=task, llm=llm)
    result = await agent.run()
    print("Result:\n", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Result:
 "Albert Einstein\nJ.K. Rowling\nJane Austen"

Troubleshooting

If you see playwright not installed errors, run playwright install chromium to install the browser binaries.
If the agent hangs, ensure your OpenAI API key is set correctly in OPENAI_API_KEY environment variable.
For network issues, verify your machine can access the target website and that no firewall blocks playwright traffic.

Key Takeaways

Use browser-use with an OpenAI LLM to automate browser tasks for web scraping.
Install playwright and run playwright install chromium before running the agent.
Run the agent asynchronously with asyncio.run() for best results.
Customize the LLM or model to fit your scraping complexity and cost requirements.
Check environment variables and network access if the agent fails to run or scrape.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.