How to intermediate · 4 min read

Browser Use custom actions

Q: Browser Use custom actions

Use the browser-use Python package to create custom browser automation actions by defining async functions and passing them to the Agent class. This lets you extend the agent's capabilities beyond default browsing tasks with your own logic and tools.

Quick answer

Use the browser-use Python package to create custom browser automation actions by defining async functions and passing them to the Agent class. This lets you extend the agent's capabilities beyond default browsing tasks with your own logic and tools.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install browser-use
Playwright installed with Chromium (run: playwright install chromium)

Setup

Install the required packages and set your OpenAI API key as an environment variable. Also, install Playwright's Chromium browser for browser automation.

bash

pip install openai browser-use
playwright install chromium

output

Requirement already satisfied: openai in ...
Requirement already satisfied: browser-use in ...
[Playwright] Chromium is installed successfully.

Step by step

Define a custom async action function that the Agent can call during browsing. Pass this action to the Agent constructor and run the agent with a task prompt. The agent will invoke your custom action as needed.

python

import os
import asyncio
from browser_use import Agent

# Custom action: simple example that returns a fixed string
async def custom_action(ctx, input_data):
    # ctx: agent context, input_data: action input
    return "Custom action executed successfully!"

async def main():
    # Initialize the agent with your OpenAI API key and custom actions
    agent = Agent(
        task="Perform a custom action and report back.",
        llm=None,  # Uses default OpenAI LLM with OPENAI_API_KEY
        actions={"custom_action": custom_action}
    )

    # Run the agent
    result = await agent.run()
    print("Agent output:", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Agent output: Custom action executed successfully!

Common variations

Use different LLMs by passing a ChatOpenAI instance from langchain_openai to the llm parameter.
Define multiple custom actions and map them in the actions dictionary.
Run the agent synchronously by wrapping async calls if needed.

python

from langchain_openai import ChatOpenAI
import asyncio
from browser_use import Agent

async def custom_action(ctx, input_data):
    return "Custom action with LangChain LLM"

async def main():
    llm = ChatOpenAI(model="gpt-4o")
    agent = Agent(
        task="Run multiple custom actions.",
        llm=llm,
        actions={"custom_action": custom_action}
    )
    result = await agent.run()
    print("Agent output:", result)

if __name__ == "__main__":
    asyncio.run(main())

output

Agent output: Custom action with LangChain LLM

Troubleshooting

If you get ModuleNotFoundError for browser_use, ensure you installed it with pip install browser-use.
If Playwright browser is missing, run playwright install chromium.
For authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
Use asyncio.run() to run async code in Python 3.7+.

✅

Key Takeaways

Use the browser-use package to define custom async actions for browser automation.
Pass your custom actions as a dictionary to the Agent constructor's actions parameter.
Always run the agent asynchronously with asyncio.run() in Python 3.8+.
Install Playwright and Chromium to enable browser control.
You can combine browser-use with LangChain's ChatOpenAI for flexible LLM integration.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗