Browser Use custom actions
Quick answer
Use the
browser-use Python package to create custom browser automation actions by defining async functions and passing them to the Agent class. This lets you extend the agent's capabilities beyond default browsing tasks with your own logic and tools.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install browser-usePlaywright installed with Chromium (run: playwright install chromium)
Setup
Install the required packages and set your OpenAI API key as an environment variable. Also, install Playwright's Chromium browser for browser automation.
pip install openai browser-use
playwright install chromium output
Requirement already satisfied: openai in ... Requirement already satisfied: browser-use in ... [Playwright] Chromium is installed successfully.
Step by step
Define a custom async action function that the Agent can call during browsing. Pass this action to the Agent constructor and run the agent with a task prompt. The agent will invoke your custom action as needed.
import os
import asyncio
from browser_use import Agent
# Custom action: simple example that returns a fixed string
async def custom_action(ctx, input_data):
# ctx: agent context, input_data: action input
return "Custom action executed successfully!"
async def main():
# Initialize the agent with your OpenAI API key and custom actions
agent = Agent(
task="Perform a custom action and report back.",
llm=None, # Uses default OpenAI LLM with OPENAI_API_KEY
actions={"custom_action": custom_action}
)
# Run the agent
result = await agent.run()
print("Agent output:", result)
if __name__ == "__main__":
asyncio.run(main()) output
Agent output: Custom action executed successfully!
Common variations
- Use different LLMs by passing a
ChatOpenAIinstance fromlangchain_openaito thellmparameter. - Define multiple custom actions and map them in the
actionsdictionary. - Run the agent synchronously by wrapping async calls if needed.
from langchain_openai import ChatOpenAI
import asyncio
from browser_use import Agent
async def custom_action(ctx, input_data):
return "Custom action with LangChain LLM"
async def main():
llm = ChatOpenAI(model="gpt-4o")
agent = Agent(
task="Run multiple custom actions.",
llm=llm,
actions={"custom_action": custom_action}
)
result = await agent.run()
print("Agent output:", result)
if __name__ == "__main__":
asyncio.run(main()) output
Agent output: Custom action with LangChain LLM
Troubleshooting
- If you get
ModuleNotFoundErrorforbrowser_use, ensure you installed it withpip install browser-use. - If Playwright browser is missing, run
playwright install chromium. - For authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - Use
asyncio.run()to run async code in Python 3.7+.
Key Takeaways
- Use the
browser-usepackage to define custom async actions for browser automation. - Pass your custom actions as a dictionary to the
Agentconstructor'sactionsparameter. - Always run the agent asynchronously with
asyncio.run()in Python 3.8+. - Install Playwright and Chromium to enable browser control.
- You can combine
browser-usewith LangChain'sChatOpenAIfor flexible LLM integration.