Skip to content

Tinyfish

The TinyFish integration allows AG2 agents to perform goal-directed web scraping. Unlike traditional scrapers, TinyFish accepts a natural language goal describing what to extract from a page — making it ideal for agents that need structured data from diverse web sources.

Configuring Your TinyFish API Key#

  1. Create a TinyFish Account:
  2. Visit TinyFish
  3. Click Sign Up and create an account

  4. Get Your API Key:

  5. Navigate to the TinyFish dashboard
  6. Generate an API key under API Keys

  7. Set the TINYFISH_API_KEY Environment Variable:

    export TINYFISH_API_KEY="your_api_key_here"
    

Package Installation#

Install AG2 with the tinyfish extra (and openai for the example below):

pip install -U "ag2[openai,tinyfish]"

Note: autogen and ag2 are aliases for the same PyPI package:

pip install -U "autogen[openai,tinyfish]"

Implementation#

Imports#

import asyncio
import os
from autogen import ConversableAgent, LLMConfig
from autogen.tools.experimental import TinyFishTool

Agent Configuration#

llm_config = LLMConfig({"api_type": "openai", "model": "gpt-4o"})

assistant = ConversableAgent(
    name="assistant",
    system_message="You are a helpful assistant that can scrape web pages using the TinyFish tool. Use the tool to extract the requested information.",
    llm_config=llm_config,
)

user_proxy = ConversableAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    llm_config=False,
)

Tool Setup#

tinyfish_tool = TinyFishTool(tinyfish_api_key=os.getenv("TINYFISH_API_KEY"))

# Register the tool for LLM recommendation and execution.
tinyfish_tool.register_for_llm(assistant)
tinyfish_tool.register_for_execution(user_proxy)

Usage Example#

async def main():
    response = await user_proxy.a_run(
        assistant,
        message="Scrape https://example.com and extract the main product offerings and pricing information.",
        max_turns=2,
        summary_method="last_msg",
    )
    await response.process()
    print(f"Final Answer: {await response.summary}")

if __name__ == "__main__":
    asyncio.run(main())

Parameters#

TinyFishTool accepts the following parameters at call time:

Parameter Type Default Description
url str required The URL to scrape
goal str required A natural language description of what information to extract from the page

Output#

Each scrape returns a dictionary with:

  • url — the scraped URL
  • goal — the extraction goal that was used
  • data — the structured data extracted by TinyFish

Error Handling#

The tool handles errors gracefully and returns them in the response:

# Failed operations return a dict with an error field
result = tinyfish_tool(
    url="https://invalid-url.com",
    goal="Extract company info"
)
if "error" in result:
    print(f"Scraping failed: {result['error']}")

Use Cases#

  • Due Diligence: Extract company information, team details, and financials from corporate websites - Code on Build with AG2
  • Competitive Analysis: Gather product and pricing data from competitor sites
  • Lead Enrichment: Scrape company profiles for sales intelligence
  • Content Research: Extract specific data points from articles and reports
  • Market Research: Collect structured data from industry publications

See Also#