Supercharging Web Crawling with Crawl4AI#
Installation#
To get started with the crawl4ai
integration in AG2, follow these steps:
-
Install AG2 with the
crawl4ai
extra:Note: If you have been using
autogen
orpyautogen
, all you need to do is upgrade it using:or
as
pyautogen
,autogen
, andag2
are aliases for the same PyPI package. -
Set up Playwright:
-
For running the code in Jupyter, use
nest_asyncio
to allow nested event loops.bash pip install nest_asyncio
You’re all set! Now you can start using browsing features in AG2.
Imports#
import os
import nest_asyncio
from pydantic import BaseModel
from autogen import AssistantAgent, UserProxyAgent
from autogen.tools.experimental import Crawl4AITool
nest_asyncio.apply()
LLM-Free Crawl4AI#
config_list = [{"api_type": "openai", "model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]
llm_config = {
"config_list": config_list,
}
user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
crawlai_tool = Crawl4AITool()
crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
result = user_proxy.initiate_chat(
recipient=assistant,
message="Get info from https://docs.ag2.ai/docs/Home",
max_turns=2,
)
Crawl4AI with LLM#
Note:
Crawl4AI
is built on top of LiteLLM and supports the same models as LiteLLM.We had great experience with
OpenAI
,Anthropic
,Gemini
andOllama
. However, as of this writing,DeepSeek
is encountering some issues.
config_list = [{"api_type": "openai", "model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]
llm_config = {
"config_list": config_list,
}
user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
# Set llm_config to Crawl4AITool
crawlai_tool = Crawl4AITool(llm_config=llm_config)
crawlai_tool.register_for_execution(user_proxy)
crawlai_tool.register_for_llm(assistant)
result = user_proxy.initiate_chat(
recipient=assistant,
message="Get info from https://docs.ag2.ai/docs/Home",
max_turns=2,
)
Crawl4AI with LLM & Schema for Structured Data#
config_list = [{"api_type": "openai", "model": "gpt-4o-mini", "api_key": os.environ["OPENAI_API_KEY"]}]
llm_config = {
"config_list": config_list,
}
user_proxy = UserProxyAgent(name="user_proxy", human_input_mode="NEVER")
assistant = AssistantAgent(name="assistant", llm_config=llm_config)