Open In Colab Open on GitHub

Installation

To get started with the crawl4ai integration in AG2, follow these steps:

  1. Install AG2 with the crawl4ai extra:

    pip install ag2[crawl4ai]
    

    Note: If you have been using autogen or pyautogen, all you need to do is upgrade it using:

    pip install -U autogen[crawl4ai]
    

    or

    pip install -U pyautogen[crawl4ai]
    

    as pyautogen, autogen, and ag2 are aliases for the same PyPI package.

  2. Set up Playwright:

    # Installs Playwright and browsers for all OS
    playwright install
    # Additional command, mandatory for Linux only
    playwright install-deps
    
  3. For running the code in Jupyter, use nest_asyncio to allow nested event loops. bash pip install nest_asyncio

You’re all set! Now you can start using browsing features in AG2.

Imports

import os

import nest_asyncio
from pydantic import BaseModel

from autogen import AssistantAgent, UserProxyAgent
from autogen.tools.experimental import Crawl4AITool

nest_asyncio.apply()

LLM-Free Crawl4AI

config_list = [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]

llm_config = {
    "config_list": config_list,
}

user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
crawlai_tool = Crawl4AITool()

crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
result = user_proxy.initiate_chat(
    recipient=assistant,
    message="Get info from https://docs.ag2.ai/docs/Home",
    max_turns=2,
)

Crawl4AI with LLM

Note: Crawl4AI is built on top of LiteLLM and supports the same models as LiteLLM.

We had great experience with OpenAI, Anthropic, Gemini and Ollama. However, as of this writing, DeepSeek is encountering some issues.

config_list = [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]

llm_config = {
    "config_list": config_list,
}

user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
# Set llm_config to Crawl4AITool
crawlai_tool = Crawl4AITool(llm_config=llm_config)

crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
result = user_proxy.initiate_chat(
    recipient=assistant,
    message="Get info from https://docs.ag2.ai/docs/Home",
    max_turns=2,
)

Crawl4AI with LLM & Schema for Structured Data

config_list = [{"model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]

llm_config = {
    "config_list": config_list,
}

user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
class Blog(BaseModel):
    title: str
    url: str


# Set llm_config and extraction_model to Crawl4AITool
crawlai_tool = Crawl4AITool(llm_config=llm_config, extraction_model=Blog)

crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
message = "Extract all blog posts from https://docs.ag2.ai/blog"
result = user_proxy.initiate_chat(
    recipient=assistant,
    message=message,
    max_turns=2,
)