Skip to content

WikipediaPageLoadTool

autogen.tools.experimental.wikipedia.wikipedia.WikipediaPageLoadTool #

WikipediaPageLoadTool(language='en', top_k=3, truncate=4000, verbose=False)

Bases: Tool

A tool to load up to N characters of Wikipedia page content along with metadata.

This tool uses a language-specific Wikipedia client to search for relevant articles and returns a list of Document objects containing truncated page content and metadata (source URL, title, page ID, timestamp, word count, and size). Ideal for agents requiring structured Wikipedia data for research, summarization, or contextual enrichment.

ATTRIBUTE DESCRIPTION
language

Wikipedia language code (default: "en").

TYPE: str

top_k

Maximum number of pages to retrieve per query (default: 3).

TYPE: int

truncate

Maximum number of characters of content per page (default: 4000).

TYPE: int

verbose

If True, prints debug information (default: False).

TYPE: bool

tool_name

Identifier used in User-Agent header.

TYPE: str

wiki_cli

Client for interacting with the Wikipedia API.

TYPE: WikipediaClient

Initializes the WikipediaPageLoadTool with configurable language, result count, and content length.

PARAMETER DESCRIPTION
language

The language code for the Wikipedia edition (default is "en").

TYPE: str DEFAULT: 'en'

top_k

The maximum number of pages to retrieve per query (default is 3; capped at MAX_PAGE_RETRIEVE).

TYPE: int DEFAULT: 3

truncate

The maximum number of characters to extract from each page (default is 4000; capped at MAX_ARTICLE_LENGTH).

TYPE: int DEFAULT: 4000

verbose

If True, enables verbose/debug logging (default is False).

TYPE: bool DEFAULT: False

Source code in autogen/tools/experimental/wikipedia/wikipedia.py
def __init__(self, language: str = "en", top_k: int = 3, truncate: int = 4000, verbose: bool = False) -> None:
    """
    Initializes the WikipediaPageLoadTool with configurable language, result count, and content length.

    Args:
        language (str): The language code for the Wikipedia edition (default is "en").
        top_k (int): The maximum number of pages to retrieve per query (default is 3;
                     capped at MAX_PAGE_RETRIEVE).
        truncate (int): The maximum number of characters to extract from each page (default is 4000;
                        capped at MAX_ARTICLE_LENGTH).
        verbose (bool): If True, enables verbose/debug logging (default is False).
    """
    self.language = language
    self.top_k = min(top_k, MAX_PAGE_RETRIEVE)
    self.truncate = min(truncate, MAX_ARTICLE_LENGTH)
    self.verbose = verbose
    self.tool_name = "wikipedia-page-load"
    self.wiki_cli = WikipediaClient(language, self.tool_name)
    super().__init__(
        name=self.tool_name,
        description=(
            "Search Wikipedia for relevant pages using a language-specific client. "
            "Returns a list of documents with truncated content and metadata including title, URL, "
            "page ID, timestamp, word count, and page size. Configure number of results with the 'top_k' parameter "
            "and content length with 'truncate'. Useful for research, summarization, or contextual enrichment."
        ),
        func_or_tool=self.content_search,
    )

language instance-attribute #

language = language

top_k instance-attribute #

top_k = min(top_k, MAX_PAGE_RETRIEVE)

truncate instance-attribute #

truncate = min(truncate, MAX_ARTICLE_LENGTH)

verbose instance-attribute #

verbose = verbose

tool_name instance-attribute #

tool_name = 'wikipedia-page-load'

wiki_cli instance-attribute #

wiki_cli = WikipediaClient(language, tool_name)

name property #

name

description property #

description

func property #

func

tool_schema property #

tool_schema

Get the schema for the tool.

This is the preferred way of handling function calls with OpeaAI and compatible frameworks.

function_schema property #

function_schema

Get the schema for the function.

This is the old way of handling function calls with OpenAI and compatible frameworks. It is provided for backward compatibility.

realtime_tool_schema property #

realtime_tool_schema

Get the schema for the tool.

This is the preferred way of handling function calls with OpeaAI and compatible frameworks.

content_search(query)

Executes a Wikipedia search and returns page content plus metadata.

PARAMETER DESCRIPTION
query

The search term to query Wikipedia.

TYPE: str

RETURNS DESCRIPTION
Union[list[Document], str]

Union[list[Document], str]: - list[Document]: Documents with up to truncate characters of page text and metadata if pages are found. - str: Error message if the search fails or no pages are found.

Notes
  • Errors are caught internally and returned as strings.
  • If no matching pages have text content, returns "No good Wikipedia Search Result was found".
Source code in autogen/tools/experimental/wikipedia/wikipedia.py
def content_search(self, query: str) -> Union[list[Document], str]:
    """
    Executes a Wikipedia search and returns page content plus metadata.

    Args:
        query (str): The search term to query Wikipedia.

    Returns:
        Union[list[Document], str]:
            - list[Document]: Documents with up to `truncate` characters of page text
              and metadata if pages are found.
            - str: Error message if the search fails or no pages are found.

    Notes:
        - Errors are caught internally and returned as strings.
        - If no matching pages have text content, returns
          "No good Wikipedia Search Result was found".
    """
    try:
        if self.verbose:
            print(f"INFO\t [{self.tool_name}] search query='{query[:MAX_QUERY_LENGTH]}' top_k={self.top_k}")
        search_results = self.wiki_cli.search(query[:MAX_QUERY_LENGTH], limit=self.top_k)
        docs: list[Document] = []
        for item in search_results:
            page = self.wiki_cli.get_page(item["title"])
            # Only process pages that exist and have text content.
            if page is not None and page.text:
                document = Document(
                    page_content=page.text[: self.truncate],
                    metadata={
                        "source": f"https://{self.language}.wikipedia.org/?curid={item['pageid']}",
                        "title": item["title"],
                        "pageid": str(item["pageid"]),
                        "timestamp": str(item["timestamp"]),
                        "wordcount": str(item["wordcount"]),
                        "size": str(item["size"]),
                    },
                )
                docs.append(document)
        if not docs:
            return "No good Wikipedia Search Result was found"
        return docs

    except Exception as e:
        return f"wikipedia search failed: {str(e)}"

register_for_llm #

register_for_llm(agent)

Registers the tool for use with a ConversableAgent's language model (LLM).

This method registers the tool so that it can be invoked by the agent during interactions with the language model.

PARAMETER DESCRIPTION
agent

The agent to which the tool will be registered.

TYPE: ConversableAgent

Source code in autogen/tools/tool.py
def register_for_llm(self, agent: "ConversableAgent") -> None:
    """Registers the tool for use with a ConversableAgent's language model (LLM).

    This method registers the tool so that it can be invoked by the agent during
    interactions with the language model.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    if self._func_schema:
        agent.update_tool_signature(self._func_schema, is_remove=False)
    else:
        agent.register_for_llm()(self)

register_for_execution #

register_for_execution(agent)

Registers the tool for direct execution by a ConversableAgent.

This method registers the tool so that it can be executed by the agent, typically outside of the context of an LLM interaction.

PARAMETER DESCRIPTION
agent

The agent to which the tool will be registered.

TYPE: ConversableAgent

Source code in autogen/tools/tool.py
def register_for_execution(self, agent: "ConversableAgent") -> None:
    """Registers the tool for direct execution by a ConversableAgent.

    This method registers the tool so that it can be executed by the agent,
    typically outside of the context of an LLM interaction.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    agent.register_for_execution()(self)

register_tool #

register_tool(agent)

Register a tool to be both proposed and executed by an agent.

Equivalent to calling both register_for_llm and register_for_execution with the same agent.

Note: This will not make the agent recommend and execute the call in the one step. If the agent recommends the tool, it will need to be the next agent to speak in order to execute the tool.

PARAMETER DESCRIPTION
agent

The agent to which the tool will be registered.

TYPE: ConversableAgent

Source code in autogen/tools/tool.py
def register_tool(self, agent: "ConversableAgent") -> None:
    """Register a tool to be both proposed and executed by an agent.

    Equivalent to calling both `register_for_llm` and `register_for_execution` with the same agent.

    Note: This will not make the agent recommend and execute the call in the one step. If the agent
    recommends the tool, it will need to be the next agent to speak in order to execute the tool.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    self.register_for_llm(agent)
    self.register_for_execution(agent)