Open In Colab Open on GitHub

Previously, in our Cross-Framework LLM Tool Integration guide, we combined tools from frameworks like LangChain, CrewAI, and PydanticAI to enhance AG2.

Now, we have taken AG2 to the next level by integrating the browser-use framework.

With browser-use ,your agents can navigate websites, gather dynamic content, and interact with web pages. This opens up new possibilities for tasks like data collection, web automation, and more.

Installation

Warning: Browser Use requires Python 3.11 or higher.

To get started with the browser-use integration in AG2, follow these steps:

  1. Install AG2 with the browser-use extra:

    pip install ag2[browser-use]
    

    Note: If you have been using autogen or pyautogen, all you need to do is upgrade it using:

    pip install -U autogen[browser-use]
    

    or

    pip install -U pyautogen[browser-use]
    

    as pyautogen, autogen, and ag2 are aliases for the same PyPI package.

  2. Set up Playwright:

    # Installs Playwright and browsers for all OS
    playwright install
    # Additional command, mandatory for Linux only
    playwright install-deps
    

You’re all set! Now you can start using browsing features in AG2.

Imports

import os

from autogen import AssistantAgent, UserProxyAgent
from autogen.tools.experimental import BrowserUseTool

Agent Configuration

Configure the agents for the interaction.

  • config_list defines the LLM configurations, including the model and API key.
  • UserProxyAgent simulates user inputs without requiring actual human interaction (set to NEVER).
  • AssistantAgent represents the AI agent, configured with the LLM settings.

Note: Browser Use supports the following models: Supported Models

We had great experience with OpenAI, Anthropic, and Gemini. However, DeepSeek and Ollama haven’t performed as well.

config_list = [
    {
        "model": "deepseek-chat",
        "api_key": os.environ["DEEPSEEK_API_KEY"],
        "api_type": "deepseek",
        "base_url": "https://api.deepseek.com/v1",
    }
]

llm_config = {
    "config_list": config_list,
}

user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)

Integrating Web Browsing with BrowserUseTool

The BrowserUseTool enables agents to interact with web browsers, allowing them to access, navigate, and perform actions on websites as part of their tasks. It acts as a bridge between the language model and the browser, empowering the agent to browse the web, search for information, and interact with dynamic web content.

To see what the agents are doing in real-time, set the headless option within the browser_config to False. This ensures that the browser runs in a visible window, allowing you to observe the agents’ interactions with the websites. By default, setting headless=True would run the browser in the background without a GUI, useful for automated tasks where visibility is not necessary.

browser_use_tool = BrowserUseTool(
    llm_config=llm_config,
    browser_config={"headless": False},
    # deepseek-chat does not support vision yet
    agent_kwargs={"use_vision": False, "generate_gif": True},
)

browser_use_tool.register_for_execution(user_proxy)
browser_use_tool.register_for_llm(assistant)

Initiate Chat

For running the code in Jupyter, use nest_asyncio to allow nested event loops.

pip install nest_asyncio
import nest_asyncio

nest_asyncio.apply()

The user_proxy.initiate_chat() method triggers the assistant to perform a web browsing task, such as searching for “AG2” on Reddit, clicking the first post, and extracting the first comment. The assistant then executes the task using the BrowserUseTool and returns the extracted content to the user.

result = user_proxy.initiate_chat(
    recipient=assistant,
    message="Go to google.com and search for AG2.",
    max_turns=2,
)