How to beginner · 4 min read

How to extract data from websites with Browser Use

Quick answer
Use the browser-use Python package to automate browsing and extract data from websites by controlling a Chromium browser via Playwright. Instantiate an Agent with a task and an LLM like ChatOpenAI, then run agent.run() to perform the extraction and get structured results.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install browser-use langchain-openai playwright
  • playwright install chromium

Setup

Install the required packages and set up environment variables. You need browser-use for browser automation, langchain-openai for the LLM interface, and playwright to control Chromium. Run playwright install chromium once after installing the packages.

bash
pip install browser-use langchain-openai playwright
playwright install chromium
output
Collecting browser-use...
Collecting langchain-openai...
Collecting playwright...
Installing collected packages...
Successfully installed browser-use langchain-openai playwright
[Playwright] Chromium is installed.

Step by step

This example shows how to extract data from a website by instructing the Agent to visit a page and scrape information. The ChatOpenAI model handles the natural language instructions.

python
import os
from browser_use import Agent
from langchain_openai import ChatOpenAI

# Set your OpenAI API key in environment variable OPENAI_API_KEY

# Create the LLM instance
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define the browsing task
agent = Agent(
    task="Go to example.com and extract the main heading and paragraph text.",
    llm=llm
)

# Run the agent synchronously
result = agent.run()
print("Extracted data:", result)
output
Extracted data: The main heading is 'Example Domain' and the paragraph text explains that this domain is for use in illustrative examples in documents.

Common variations

You can run the Agent asynchronously for better integration in async apps. Also, you can adjust the model or temperature for different extraction styles.

python
import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI

async def main():
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
    agent = Agent(
        task="Visit example.com and extract the main heading and paragraph.",
        llm=llm
    )
    result = await agent.run()
    print("Async extracted data:", result)

asyncio.run(main())
output
Async extracted data: The main heading is 'Example Domain' and the paragraph text describes the domain's purpose for illustrative examples.

Troubleshooting

  • If you see errors about missing browsers, run playwright install chromium to install the required browser.
  • Ensure your OPENAI_API_KEY environment variable is set correctly.
  • If the agent returns incomplete data, try lowering the temperature or simplifying the task prompt.

Key Takeaways

  • Use the browser-use package with ChatOpenAI to automate web data extraction.
  • Install Chromium via Playwright to enable browser automation.
  • Run agent.run() synchronously or asynchronously depending on your app.
  • Adjust model parameters like temperature to control extraction detail.
  • Set environment variables securely; never hardcode API keys.
Verified 2026-04 · gpt-4o-mini
Verify ↗