How to beginner · 4 min read

How to extract data from websites with Browser Use

Q: How to extract data from websites with Browser Use

Use the browser-use Python package to automate browsing and extract data from websites by controlling a Chromium browser via Playwright. Instantiate an Agent with a task and an LLM like ChatOpenAI, then run agent.run() to perform the extraction and get structured results.

Quick answer

Use the browser-use Python package to automate browsing and extract data from websites by controlling a Chromium browser via Playwright. Instantiate an Agent with a task and an LLM like ChatOpenAI, then run agent.run() to perform the extraction and get structured results.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install browser-use langchain-openai playwright
playwright install chromium

Setup

Install the required packages and set up environment variables. You need browser-use for browser automation, langchain-openai for the LLM interface, and playwright to control Chromium. Run playwright install chromium once after installing the packages.

bash

pip install browser-use langchain-openai playwright
playwright install chromium

output

Collecting browser-use...
Collecting langchain-openai...
Collecting playwright...
Installing collected packages...
Successfully installed browser-use langchain-openai playwright
[Playwright] Chromium is installed.

Step by step

This example shows how to extract data from a website by instructing the Agent to visit a page and scrape information. The ChatOpenAI model handles the natural language instructions.

python

import os
from browser_use import Agent
from langchain_openai import ChatOpenAI

# Set your OpenAI API key in environment variable OPENAI_API_KEY

# Create the LLM instance
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define the browsing task
agent = Agent(
    task="Go to example.com and extract the main heading and paragraph text.",
    llm=llm
)

# Run the agent synchronously
result = agent.run()
print("Extracted data:", result)

output

Extracted data: The main heading is 'Example Domain' and the paragraph text explains that this domain is for use in illustrative examples in documents.

Common variations

You can run the Agent asynchronously for better integration in async apps. Also, you can adjust the model or temperature for different extraction styles.

python

import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
    agent = Agent(
        task="Visit example.com and extract the main heading and paragraph.",
        llm=llm
    )
    result = await agent.run()
    print("Async extracted data:", result)

asyncio.run(main())

output

Async extracted data: The main heading is 'Example Domain' and the paragraph text describes the domain's purpose for illustrative examples.

Troubleshooting

If you see errors about missing browsers, run playwright install chromium to install the required browser.
Ensure your OPENAI_API_KEY environment variable is set correctly.
If the agent returns incomplete data, try lowering the temperature or simplifying the task prompt.

✅

Key Takeaways

Use the browser-use package with ChatOpenAI to automate web data extraction.
Install Chromium via Playwright to enable browser automation.
Run agent.run() synchronously or asynchronously depending on your app.
Adjust model parameters like temperature to control extraction detail.
Set environment variables securely; never hardcode API keys.

Verified 2026-04 · gpt-4o-mini

Verify ↗