Tinyfish
The TinyFish integration allows AG2 agents to perform goal-directed web scraping. Unlike traditional scrapers, TinyFish accepts a natural language goal describing what to extract from a page — making it ideal for agents that need structured data from diverse web sources.
Configuring Your TinyFish API Key#
- Create a TinyFish Account:
- Visit TinyFish
-
Click Sign Up and create an account
-
Get Your API Key:
- Navigate to the TinyFish dashboard
-
Generate an API key under API Keys
-
Set the
TINYFISH_API_KEYEnvironment Variable:
Package Installation#
Install AG2 with the tinyfish extra (and openai for the example below):
Note:
autogenandag2are aliases for the same PyPI package:
Implementation#
Imports#
import asyncio
import os
from autogen import ConversableAgent, LLMConfig
from autogen.tools.experimental import TinyFishTool
Agent Configuration#
llm_config = LLMConfig({"api_type": "openai", "model": "gpt-4o"})
assistant = ConversableAgent(
name="assistant",
system_message="You are a helpful assistant that can scrape web pages using the TinyFish tool. Use the tool to extract the requested information.",
llm_config=llm_config,
)
user_proxy = ConversableAgent(
name="user_proxy",
human_input_mode="NEVER",
llm_config=False,
)
Tool Setup#
tinyfish_tool = TinyFishTool(tinyfish_api_key=os.getenv("TINYFISH_API_KEY"))
# Register the tool for LLM recommendation and execution.
tinyfish_tool.register_for_llm(assistant)
tinyfish_tool.register_for_execution(user_proxy)
Usage Example#
async def main():
response = await user_proxy.a_run(
assistant,
message="Scrape https://example.com and extract the main product offerings and pricing information.",
max_turns=2,
summary_method="last_msg",
)
await response.process()
print(f"Final Answer: {await response.summary}")
if __name__ == "__main__":
asyncio.run(main())
Parameters#
TinyFishTool accepts the following parameters at call time:
| Parameter | Type | Default | Description |
|---|---|---|---|
url | str | required | The URL to scrape |
goal | str | required | A natural language description of what information to extract from the page |
Output#
Each scrape returns a dictionary with:
url— the scraped URLgoal— the extraction goal that was useddata— the structured data extracted by TinyFish
Error Handling#
The tool handles errors gracefully and returns them in the response:
# Failed operations return a dict with an error field
result = tinyfish_tool(
url="https://invalid-url.com",
goal="Extract company info"
)
if "error" in result:
print(f"Scraping failed: {result['error']}")
Use Cases#
- Due Diligence: Extract company information, team details, and financials from corporate websites - Code on Build with AG2
- Competitive Analysis: Gather product and pricing data from competitor sites
- Lead Enrichment: Scrape company profiles for sales intelligence
- Content Research: Extract specific data points from articles and reports
- Market Research: Collect structured data from industry publications