How to extract data from websites with Browser Use
Quick answer
Use the
browser-use Python package to automate browsing and extract data from websites by controlling a Chromium browser via Playwright. Instantiate an Agent with a task and an LLM like ChatOpenAI, then run agent.run() to perform the extraction and get structured results.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install browser-use langchain-openai playwrightplaywright install chromium
Setup
Install the required packages and set up environment variables. You need browser-use for browser automation, langchain-openai for the LLM interface, and playwright to control Chromium. Run playwright install chromium once after installing the packages.
pip install browser-use langchain-openai playwright
playwright install chromium output
Collecting browser-use... Collecting langchain-openai... Collecting playwright... Installing collected packages... Successfully installed browser-use langchain-openai playwright [Playwright] Chromium is installed.
Step by step
This example shows how to extract data from a website by instructing the Agent to visit a page and scrape information. The ChatOpenAI model handles the natural language instructions.
import os
from browser_use import Agent
from langchain_openai import ChatOpenAI
# Set your OpenAI API key in environment variable OPENAI_API_KEY
# Create the LLM instance
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Define the browsing task
agent = Agent(
task="Go to example.com and extract the main heading and paragraph text.",
llm=llm
)
# Run the agent synchronously
result = agent.run()
print("Extracted data:", result) output
Extracted data: The main heading is 'Example Domain' and the paragraph text explains that this domain is for use in illustrative examples in documents.
Common variations
You can run the Agent asynchronously for better integration in async apps. Also, you can adjust the model or temperature for different extraction styles.
import os
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
agent = Agent(
task="Visit example.com and extract the main heading and paragraph.",
llm=llm
)
result = await agent.run()
print("Async extracted data:", result)
asyncio.run(main()) output
Async extracted data: The main heading is 'Example Domain' and the paragraph text describes the domain's purpose for illustrative examples.
Troubleshooting
- If you see errors about missing browsers, run
playwright install chromiumto install the required browser. - Ensure your
OPENAI_API_KEYenvironment variable is set correctly. - If the agent returns incomplete data, try lowering the temperature or simplifying the task prompt.
Key Takeaways
- Use the
browser-usepackage withChatOpenAIto automate web data extraction. - Install Chromium via Playwright to enable browser automation.
- Run
agent.run()synchronously or asynchronously depending on your app. - Adjust model parameters like temperature to control extraction detail.
- Set environment variables securely; never hardcode API keys.