Skip to content

WebSurferAgent with Firecrawl Integration#

Open In Colab Open on GitHub

This notebook demonstrates how to use the WebSurferAgent with the Firecrawl tool for web scraping and crawling.

Setup#

First, import the necessary modules and set up the WebSurferAgent with Firecrawl.

import os

from autogen.agents.experimental.websurfer import WebSurferAgent

# Set up your API keys
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY", "your_firecrawl_api_key_here")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "your_openai_api_key_here")

# LLM configuration
llm_config = {
    "model": "gpt-4",
    "api_key": OPENAI_API_KEY,
    "temperature": 0.1,
}

print("✓ Setup complete")

Creating WebSurferAgent with Firecrawl#

Create a WebSurferAgent that uses Firecrawl as its web tool.

# Create WebSurferAgent with Firecrawl
websurfer_agent = WebSurferAgent(
    name="WebSurfer",
    llm_config=llm_config,
    web_tool="firecrawl",
    web_tool_kwargs={
        "firecrawl_api_key": FIRECRAWL_API_KEY,
        # Optional: specify custom Firecrawl API URL for self-hosted instances
        # "firecrawl_api_url": "https://your-firecrawl-instance.com",
    },
    system_message="You are a helpful web researcher. Use Firecrawl to scrape, crawl, search, and research websites to answer user questions.",
)

print(f"✓ WebSurferAgent created with {type(websurfer_agent.tool).__name__}")
print(
    f"✓ Available tool methods: {[method for method in dir(websurfer_agent.tool) if not method.startswith('_') and callable(getattr(websurfer_agent.tool, method))]}"
)

Available Firecrawl Methods#

The WebSurferAgent with Firecrawl tool provides access to all Firecrawl capabilities:

  1. Scrape: Extract content from a single URL
  2. Crawl: Recursively crawl a website starting from a URL
  3. Map: Discover URLs from a website
  4. Search: Search the web for content
  5. Deep Research: Perform comprehensive research on a topic with analysis

Example Usage#

Here are some example tasks you can perform with the WebSurferAgent using Firecrawl:

# Example 1: Simple web scraping
scrape_message = "Scrape the homepage of https://example.com and summarize the main content."

# Example 2: Website crawling
crawl_message = "Crawl https://example.com and find all the pages related to documentation."

# Example 3: Website mapping
map_message = "Map the structure of https://example.com and list all the main sections."

# Example 4: Web search
search_message = "Search for recent articles about artificial intelligence and summarize the top 3 results."

# Example 5: Deep research
research_message = "Perform deep research on 'sustainable energy trends 2024' and provide a comprehensive analysis."

print("Example messages prepared. Run them individually with the websurfer_agent.")

Running Tasks#

To run any of the above tasks, you would use the agent like this:

# Create a user proxy agent to interact with the websurfer
from autogen import UserProxyAgent

user_proxy = UserProxyAgent(
    name="User",
    human_input_mode="NEVER",
    code_execution_config=False,
)

# Start a conversation
user_proxy.initiate_chat(
    websurfer_agent,
    message=scrape_message,
    max_turns=1,
)

Key Features#

The WebSurferAgent with Firecrawl provides:

  • Multiple web interaction methods: scrape, crawl, map, search, and deep research
  • Self-hosted support: Connect to your own Firecrawl instance via firecrawl_api_url
  • Flexible configuration: Pass any Firecrawl parameters through web_tool_kwargs
  • Consistent API: Same interface as other WebSurferAgent tools (Tavily, DuckDuckGo, etc.)
  • Rich content extraction: Supports multiple output formats (markdown, HTML)
  • Advanced filtering: Include/exclude patterns, custom headers, timeouts

This makes the WebSurferAgent a powerful tool for web research and content extraction tasks.