WebSurferAgent with Firecrawl Integration#
This notebook demonstrates how to use the WebSurferAgent with the Firecrawl tool for web scraping and crawling.
Setup#
First, import the necessary modules and set up the WebSurferAgent with Firecrawl.
import os
from autogen.agents.experimental.websurfer import WebSurferAgent
# Set up your API keys
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY", "your_firecrawl_api_key_here")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "your_openai_api_key_here")
# LLM configuration
llm_config = {
"model": "gpt-4",
"api_key": OPENAI_API_KEY,
"temperature": 0.1,
}
print("✓ Setup complete")
Creating WebSurferAgent with Firecrawl#
Create a WebSurferAgent that uses Firecrawl as its web tool.
# Create WebSurferAgent with Firecrawl
websurfer_agent = WebSurferAgent(
name="WebSurfer",
llm_config=llm_config,
web_tool="firecrawl",
web_tool_kwargs={
"firecrawl_api_key": FIRECRAWL_API_KEY,
# Optional: specify custom Firecrawl API URL for self-hosted instances
# "firecrawl_api_url": "https://your-firecrawl-instance.com",
},
system_message="You are a helpful web researcher. Use Firecrawl to scrape, crawl, search, and research websites to answer user questions.",
)
print(f"✓ WebSurferAgent created with {type(websurfer_agent.tool).__name__}")
print(
f"✓ Available tool methods: {[method for method in dir(websurfer_agent.tool) if not method.startswith('_') and callable(getattr(websurfer_agent.tool, method))]}"
)
Available Firecrawl Methods#
The WebSurferAgent with Firecrawl tool provides access to all Firecrawl capabilities:
- Scrape: Extract content from a single URL
- Crawl: Recursively crawl a website starting from a URL
- Map: Discover URLs from a website
- Search: Search the web for content
- Deep Research: Perform comprehensive research on a topic with analysis
Example Usage#
Here are some example tasks you can perform with the WebSurferAgent using Firecrawl:
# Example 1: Simple web scraping
scrape_message = "Scrape the homepage of https://example.com and summarize the main content."
# Example 2: Website crawling
crawl_message = "Crawl https://example.com and find all the pages related to documentation."
# Example 3: Website mapping
map_message = "Map the structure of https://example.com and list all the main sections."
# Example 4: Web search
search_message = "Search for recent articles about artificial intelligence and summarize the top 3 results."
# Example 5: Deep research
research_message = "Perform deep research on 'sustainable energy trends 2024' and provide a comprehensive analysis."
print("Example messages prepared. Run them individually with the websurfer_agent.")
Running Tasks#
To run any of the above tasks, you would use the agent like this:
# Create a user proxy agent to interact with the websurfer
from autogen import UserProxyAgent
user_proxy = UserProxyAgent(
name="User",
human_input_mode="NEVER",
code_execution_config=False,
)
# Start a conversation
user_proxy.initiate_chat(
websurfer_agent,
message=scrape_message,
max_turns=1,
)
Key Features#
The WebSurferAgent with Firecrawl provides:
- Multiple web interaction methods: scrape, crawl, map, search, and deep research
- Self-hosted support: Connect to your own Firecrawl instance via
firecrawl_api_url
- Flexible configuration: Pass any Firecrawl parameters through
web_tool_kwargs
- Consistent API: Same interface as other WebSurferAgent tools (Tavily, DuckDuckGo, etc.)
- Rich content extraction: Supports multiple output formats (markdown, HTML)
- Advanced filtering: Include/exclude patterns, custom headers, timeouts
This makes the WebSurferAgent a powerful tool for web research and content extraction tasks.