ReliableTool

autogen.tools.experimental.ReliableTool #

ReliableTool(name, func_or_tool, runner_llm_config, validator_llm_config, description=None, system_message_addition_for_tool_calling='', system_message_addition_for_result_validation='', max_tool_invocations=3, enable_dynamic_validation=False, messages=None, ground_truth=None)

Bases: Tool

A ReliableTool wraps an existing function or tool. When the ReliableTool is invoked, it kicks off an internal Group Chat where a Runner and Validator agent will iteratively invoke the wrapped function or tool until the output of a single invocation of the original function or tool satisfies the provided validation criteria. Reliable Tools are best used when the LLM used or the function or tool itself is unreliable. Commonly this happens when using small, local LLMs, <32b params Or when functions/tools are used to "explore" (doing many web searches, exploring a database with SQL) The Reliable Tool allows the user to bake a result validation strategy into the tool itself so that the broader group chat/agentic system can be built more clearly around the intended flow instead of needing to focus so much on retry and validation loops.

Additionally, the .run() and .a_run() methods serve as a way to use LLMs to invoke a specific tool outside of a Group Chat or similar structure to provide a more traditional programming method of using LLMs and tools in code.

PARAMETER	DESCRIPTION
`name`	A unique and descriptive name for this ReliableTool instance. This name is used for logging, internal context management, and can be how other agents or systems refer to this specific reliable capability. Example: `"AccurateWeatherForecaster"`, `"ValidatedCustomerLookup"` TYPE: `str`
`func_or_tool`	The core Python function or an existing AG2 `Tool` instance that this `ReliableTool` will manage and execute. This is the underlying capability you want to enhance with reliability features like retries and validation. The `ReliableTool` will handle calling this function with arguments determined by its internal Runner Agent based on the provided `task`. Example: `my_api_call_function`, `existing_search_tool_instance` TYPE: `Union[Callable[..., Any], Tool]`
`runner_llm_config`	The LLM configuration for the internal "Runner Agent". This agent is responsible for interpreting the high-level `task` provided when the `ReliableTool` is invoked, deciding the appropriate arguments for the `func_or_tool`, and initiating its execution. This configuration dictates the model, API keys, temperature, etc., for the LLM that attempts to call your function. It must support tool/function calling. Example: `LLMConfig(config_list=oai_config_list, model="gpt-4o-mini")` `{"config_list": [{"model": "gpt-3.5-turbo", "api_key": "..."}], "temperature": 0.5}` TYPE: `Union[LLMConfig, dict[str, Any]]`
`validator_llm_config`	The LLM configuration for the internal "Validator Agent". After the `func_or_tool` executes successfully, this agent receives its string output and assesses whether it meets defined validation criteria. It is configured for structured output (Pydantic model `ValidationResult`) to provide a boolean validation status and a justification. This configuration dictates the model, etc., for the LLM that validates the function's result. It can be the same as `runner_llm_config` or different. Example: `LLMConfig(config_list=oai_config_list, model="gpt-4o-mini")` TYPE: `Union[LLMConfig, dict[str, Any]]`
`description`	None): A human-readable description of what this `ReliableTool` achieves. If `None`, the description is inferred from the docstring of the provided `func_or_tool`. This description is primarily for the public-facing `ReliableTool` (e.g., when registered with an outer agent for it to decide when to use this tool). Example: `"Reliably fetches and validates current weather information for a specified city."` TYPE: `(Optional[str], default)` DEFAULT: `None`
`system_message_addition_for_tool_calling`	""): Additional text appended to the system message of the internal "Runner Agent". This allows you to provide specific instructions, context, or constraints to the LLM responsible for deciding how to call your underlying `func_or_tool`. Use this when the Runner Agent needs more guidance than just the task description and the function's signature to correctly formulate arguments. Example: `"When calling 'search_products', if the task mentions 'budget', ensure the 'max_price' argument is set accordingly. Prioritize items in stock."` TYPE: `(str, default)` DEFAULT: `''`
`system_message_addition_for_result_validation`	""): Additional text appended to the system message of the internal "Validator Agent". This is where you define the base or static criteria for validating the result (string representation) of your `func_or_tool`. These criteria are applied on every validation attempt unless overridden or supplemented by dynamic validation. Example: `"The stock price must be a positive number. The company name in the result must match the one in the task. If data is unavailable, the result should explicitly state 'Data not found'."` TYPE: `(str, default)` DEFAULT: `''`
`max_tool_invocations`	3): The maximum number of times the internal "Runner Agent" can attempt to call the underlying `func_or_tool`. This limit includes the initial attempt and any subsequent retries that occur due to: 1. Direct execution errors from `func_or_tool`. 2. The Runner Agent failing to generate a valid tool call. 3. The Validator Agent deeming a successful execution's result as invalid. Adjust this to control retries and prevent excessive LLM calls, considering the potential flakiness of the `func_or_tool` or complexity of parameterization. Example: `max_tool_invocations=2` (allows one initial attempt and one retry if needed). TYPE: `(int, default)` DEFAULT: `3`
`enable_dynamic_validation`	False): If `True`, the public-facing `run` (or `a_run`) method of this `ReliableTool` (accessible via its `func` attribute after initialization) will accept an additional optional argument: `validation_prompt_addition: Optional[str]`. If a string is provided for this argument during a call, it will be appended to the Validator Agent's system message for that specific run, allowing validation criteria to be tailored on-the-fly based on the task. Example: If `True`, `my_tool.func(task="search for AG2 examples", validation_prompt_addition="Result must include Python code snippets.")` TYPE: `(bool, default)` DEFAULT: `False`
`messages`	None): A list of initial messages (e.g., from a prior conversation history) to provide context to the internal Runner and Validator agents. These messages are prepended to the message history seen by these agents during their internal chat, helping them understand the `task` in a broader context. Use when the `task` for the `ReliableTool` might refer to entities or intentions established in preceding turns of a conversation. Example: `messages=[{"role": "user", "content": "I'm interested in large-cap tech stocks."}, {"role": "assistant", "content": "Okay, any specific ones?"}]` (Then a task like "Fetch the latest price for 'the one we just discussed'.") TYPE: `(Optional[List[dict[str, Any]]], default)` DEFAULT: `None`
`ground_truth`	None): A list of strings representing factual information, examples, or specific constraints that should be considered by the internal Runner and Validator agents. These are injected into the conversation history as distinct user messages (e.g., "[[Provided Ground Truth 1]]: ..."). Use to provide specific, factual data or strong hints that might not fit naturally into system messages or prior conversation history, guiding the agents towards correct interpretation or validation. Example: `ground_truth=["The API rate limit is 10 requests per minute.", "User preference: only show results from the last 7 days."]` TYPE: `(Optional[List[str]], default)` DEFAULT: `None`

Source code in autogen/tools/experimental/reliable/reliable.py

def __init__(
    self,
    name: str,
    func_or_tool: Union[Callable[..., Any], Tool],
    runner_llm_config: Union[LLMConfig, dict[str, Any]],
    validator_llm_config: Union[LLMConfig, dict[str, Any]],
    description: Optional[str] = None,
    system_message_addition_for_tool_calling: str = "",
    system_message_addition_for_result_validation: str = "",
    max_tool_invocations: int = 3,
    enable_dynamic_validation: bool = False,
    messages: Optional[List[dict[str, Any]]] = None,
    ground_truth: Optional[List[str]] = None,
) -> None:
    """
    A ReliableTool wraps an existing function or tool.
    When the ReliableTool is invoked, it kicks off an internal Group Chat where a Runner
    and Validator agent will iteratively invoke the wrapped function or tool until
    *the output of a single invocation of the original function or tool satisfies the provided validation criteria.*
    Reliable Tools are best used when the LLM used or the function or tool itself is unreliable.
    Commonly this happens when using small, local LLMs, <32b params
    Or when functions/tools are used to "explore" (doing many web searches, exploring a database with SQL)
    The Reliable Tool allows the user to bake a result validation strategy into the tool itself
    so that the broader group chat/agentic system can be built more clearly around the intended flow
    instead of needing to focus so much on retry and validation loops.

    Additionally, the .run() and .a_run() methods serve as a way to use LLMs to invoke a specific tool outside
    of a Group Chat or similar structure to provide a more traditional programming method of using LLMs and tools in code.

    Args:
        name (str):
            A unique and descriptive name for this ReliableTool instance.
            This name is used for logging, internal context management, and can be
            how other agents or systems refer to this specific reliable capability.
            Example: `"AccurateWeatherForecaster"`, `"ValidatedCustomerLookup"`

        func_or_tool (Union[Callable[..., Any], Tool]):
            The core Python function or an existing AG2 `Tool` instance that this
            `ReliableTool` will manage and execute. This is the underlying capability
            you want to enhance with reliability features like retries and validation.
            The `ReliableTool` will handle calling this function with arguments
            determined by its internal Runner Agent based on the provided `task`.
            Example: `my_api_call_function`, `existing_search_tool_instance`

        runner_llm_config (Union[LLMConfig, dict[str, Any]]):
            The LLM configuration for the internal "Runner Agent". This agent is
            responsible for interpreting the high-level `task` provided when the
            `ReliableTool` is invoked, deciding the appropriate arguments for the
            `func_or_tool`, and initiating its execution.
            This configuration dictates the model, API keys, temperature, etc., for
            the LLM that attempts to call your function. It must support tool/function calling.
            Example: `LLMConfig(config_list=oai_config_list, model="gpt-4o-mini")`
                     `{"config_list": [{"model": "gpt-3.5-turbo", "api_key": "..."}], "temperature": 0.5}`

        validator_llm_config (Union[LLMConfig, dict[str, Any]]):
            The LLM configuration for the internal "Validator Agent". After the
            `func_or_tool` executes successfully, this agent receives its string output
            and assesses whether it meets defined validation criteria. It is
            configured for structured output (Pydantic model `ValidationResult`)
            to provide a boolean validation status and a justification.
            This configuration dictates the model, etc., for the LLM that validates
            the function's result. It can be the same as `runner_llm_config` or different.
            Example: `LLMConfig(config_list=oai_config_list, model="gpt-4o-mini")`

        description (Optional[str], default: None):
            A human-readable description of what this `ReliableTool` achieves.
            If `None`, the description is inferred from the docstring of the
            provided `func_or_tool`. This description is primarily for the public-facing
            `ReliableTool` (e.g., when registered with an outer agent for it to decide
            when to use this tool).
            Example: `"Reliably fetches and validates current weather information for a specified city."`

        system_message_addition_for_tool_calling (str, default: ""):
            Additional text appended to the system message of the internal "Runner Agent".
            This allows you to provide specific instructions, context, or constraints
            to the LLM responsible for deciding *how* to call your underlying `func_or_tool`.
            Use this when the Runner Agent needs more guidance than just the task
            description and the function's signature to correctly formulate arguments.
            Example: `"When calling 'search_products', if the task mentions 'budget', ensure the 'max_price' argument is set accordingly. Prioritize items in stock."`

        system_message_addition_for_result_validation (str, default: ""):
            Additional text appended to the system message of the internal "Validator Agent".
            This is where you define the *base* or *static* criteria for validating the
            *result* (string representation) of your `func_or_tool`. These criteria
            are applied on every validation attempt unless overridden or supplemented by
            dynamic validation.
            Example: `"The stock price must be a positive number. The company name in the result must match the one in the task. If data is unavailable, the result should explicitly state 'Data not found'."`

        max_tool_invocations (int, default: 3):
            The maximum number of times the internal "Runner Agent" can attempt to
            call the underlying `func_or_tool`. This limit includes the initial attempt
            and any subsequent retries that occur due to:
            1. Direct execution errors from `func_or_tool`.
            2. The Runner Agent failing to generate a valid tool call.
            3. The Validator Agent deeming a successful execution's result as invalid.
            Adjust this to control retries and prevent excessive LLM calls, considering
            the potential flakiness of the `func_or_tool` or complexity of parameterization.
            Example: `max_tool_invocations=2` (allows one initial attempt and one retry if needed).

        enable_dynamic_validation (bool, default: False):
            If `True`, the public-facing `run` (or `a_run`) method of this `ReliableTool`
            (accessible via its `func` attribute after initialization) will accept an
            additional optional argument: `validation_prompt_addition: Optional[str]`.
            If a string is provided for this argument during a call, it will be appended
            to the Validator Agent's system message *for that specific run*, allowing
            validation criteria to be tailored on-the-fly based on the task.
            Example: If `True`, `my_tool.func(task="search for AG2 examples", validation_prompt_addition="Result must include Python code snippets.")`

        messages (Optional[List[dict[str, Any]]], default: None):
            A list of initial messages (e.g., from a prior conversation history) to
            provide context to the internal Runner and Validator agents. These messages
            are prepended to the message history seen by these agents during their
            internal chat, helping them understand the `task` in a broader context.
            Use when the `task` for the `ReliableTool` might refer to entities or
            intentions established in preceding turns of a conversation.
            Example: `messages=[{"role": "user", "content": "I'm interested in large-cap tech stocks."}, {"role": "assistant", "content": "Okay, any specific ones?"}]`
                     (Then a task like "Fetch the latest price for 'the one we just discussed'.")

        ground_truth (Optional[List[str]], default: None):
            A list of strings representing factual information, examples, or specific
            constraints that should be considered by the internal Runner and Validator
            agents. These are injected into the conversation history as distinct user
            messages (e.g., "[[Provided Ground Truth 1]]: ...").
            Use to provide specific, factual data or strong hints that might not fit
            naturally into system messages or prior conversation history, guiding the
            agents towards correct interpretation or validation.
            Example: `ground_truth=["The API rate limit is 10 requests per minute.", "User preference: only show results from the last 7 days."]`
    """
    self._original_func, original_name, original_description = self._extract_func_details(func_or_tool)
    self._is_original_func_async = inspect.iscoroutinefunction(self._original_func)

    self._runner_llm_config = ConversableAgent._validate_llm_config(runner_llm_config)
    if self._runner_llm_config is False:
        raise ValueError("Runner LLM config failed validation.")
    # Validate validator_llm_config and store it. It can be LLMConfig | dict | False.
    self._validator_llm_config = ConversableAgent._validate_llm_config(validator_llm_config)
    if self._validator_llm_config is False:  # Check before use in _setup_validator_agent
        raise ValueError("Validator LLM config failed validation.")

    self._runner_system_message_addition = system_message_addition_for_tool_calling
    self._validator_system_message_addition = system_message_addition_for_result_validation
    self.max_tool_invocations = max_tool_invocations
    self._context_variables_key = f"{name}_ReliableToolContext_{id(self)}"

    self._original_func_name = original_name
    self.enable_dynamic_validation = enable_dynamic_validation

    self._init_messages = copy.deepcopy(messages) if messages is not None else None
    self._init_ground_truth = copy.deepcopy(ground_truth) if ground_truth else None

    self._tool_description = description if description is not None else original_description

    public_entry_point_func = self._define_public_entry_point(
        self._is_original_func_async, self.enable_dynamic_validation
    )

    super().__init__(
        name=name,
        description=self._tool_description,
        func_or_tool=public_entry_point_func,
    )

    self._validator_name = f"{self.name}_Validator"
    self._runner_name = f"{self.name}_Runner"

    self._validator = self._setup_validator_agent()
    self._runner = self._setup_runner_agent()
    self._reliable_func_wrapper = reliable_function_wrapper(
        self._original_func, self._validator, self._runner, self._context_variables_key
    )
    self._setup_runner_tool()
    self._register_internal_hooks()

name `property` #

name

description `property` #

description

func `property` #

func

tool_schema `property` #

tool_schema

Get the schema for the tool.

This is the preferred way of handling function calls with OpeaAI and compatible frameworks.

function_schema `property` #

function_schema

Get the schema for the function.

This is the old way of handling function calls with OpenAI and compatible frameworks. It is provided for backward compatibility.

realtime_tool_schema `property` #

realtime_tool_schema

Get the schema for the tool.

This is the preferred way of handling function calls with OpeaAI and compatible frameworks.

INTERNAL_TOOL_NAME_PREFIX `class-attribute` `instance-attribute` #

INTERNAL_TOOL_NAME_PREFIX = 'execute_'

max_tool_invocations `instance-attribute` #

max_tool_invocations = max_tool_invocations

enable_dynamic_validation `instance-attribute` #

enable_dynamic_validation = enable_dynamic_validation

register_for_llm #

register_for_llm(agent)

Registers the tool for use with a ConversableAgent's language model (LLM).

This method registers the tool so that it can be invoked by the agent during interactions with the language model.

PARAMETER	DESCRIPTION
`agent`	The agent to which the tool will be registered. TYPE: `ConversableAgent`

Source code in autogen/tools/tool.py

def register_for_llm(self, agent: "ConversableAgent") -> None:
    """Registers the tool for use with a ConversableAgent's language model (LLM).

    This method registers the tool so that it can be invoked by the agent during
    interactions with the language model.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    if self._func_schema:
        agent.update_tool_signature(self._func_schema, is_remove=False)
    agent.register_for_llm()(self)

register_for_execution #

register_for_execution(agent)

Registers the tool for direct execution by a ConversableAgent.

This method registers the tool so that it can be executed by the agent, typically outside of the context of an LLM interaction.

PARAMETER	DESCRIPTION
`agent`	The agent to which the tool will be registered. TYPE: `ConversableAgent`

Source code in autogen/tools/tool.py

def register_for_execution(self, agent: "ConversableAgent") -> None:
    """Registers the tool for direct execution by a ConversableAgent.

    This method registers the tool so that it can be executed by the agent,
    typically outside of the context of an LLM interaction.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    agent.register_for_execution()(self)

register_tool #

register_tool(agent)

Register a tool to be both proposed and executed by an agent.

Equivalent to calling both register_for_llm and register_for_execution with the same agent.

Note: This will not make the agent recommend and execute the call in the one step. If the agent recommends the tool, it will need to be the next agent to speak in order to execute the tool.

PARAMETER	DESCRIPTION
`agent`	The agent to which the tool will be registered. TYPE: `ConversableAgent`

Source code in autogen/tools/tool.py

def register_tool(self, agent: "ConversableAgent") -> None:
    """Register a tool to be both proposed and executed by an agent.

    Equivalent to calling both `register_for_llm` and `register_for_execution` with the same agent.

    Note: This will not make the agent recommend and execute the call in the one step. If the agent
    recommends the tool, it will need to be the next agent to speak in order to execute the tool.

    Args:
        agent (ConversableAgent): The agent to which the tool will be registered.
    """
    self.register_for_llm(agent)
    self.register_for_execution(agent)

run #

run(task, context_variables=None, validation_prompt_addition=None, messages=None, ground_truth=None)

Source code in autogen/tools/experimental/reliable/reliable.py

def run(
    self,
    task: str,
    context_variables: Optional[ContextVariables] = None,
    validation_prompt_addition: Optional[str] = None,
    messages: Optional[list[dict[str, Any]]] = None,
    ground_truth: Optional[List[str]] = None,
) -> Any:
    if self._is_original_func_async:
        raise TypeError(f"Sync 'run()' called for async tool '{self.name}'. Use 'a_run()'.")
    return self._process_run(
        task=task,
        context_variables=context_variables,
        validation_prompt_addition=validation_prompt_addition,
        messages=messages,
        ground_truth=ground_truth,
    )

a_run `async` #

a_run(task, context_variables=None, validation_prompt_addition=None, messages=None, ground_truth=None)

Source code in autogen/tools/experimental/reliable/reliable.py

async def a_run(
    self,
    task: str,
    context_variables: Optional[ContextVariables] = None,
    validation_prompt_addition: Optional[str] = None,
    messages: Optional[list[dict[str, Any]]] = None,
    ground_truth: Optional[List[str]] = None,
) -> Any:
    if not self._is_original_func_async:
        warnings.warn(
            f"Running sync function '{self._original_func_name}' wrapped by ReliableTool '{self.name}' "
            f"asynchronously using 'a_run()'. The underlying execution of _process_run will be synchronous "
            f"within an executor.",
            UserWarning,
        )

    loop = asyncio.get_running_loop()
    func_call = functools.partial(
        self._process_run,
        task=task,
        context_variables=context_variables,
        validation_prompt_addition=validation_prompt_addition,
        messages=messages,
        ground_truth=ground_truth,
    )
    return await loop.run_in_executor(None, func_call)

run_and_get_details #

run_and_get_details(task, context_variables=None, validation_prompt_addition=None, messages=None, ground_truth=None)

Source code in autogen/tools/experimental/reliable/reliable.py

def run_and_get_details(
    self,
    task: str,
    context_variables: Optional[ContextVariables] = None,
    validation_prompt_addition: Optional[str] = None,
    messages: Optional[list[dict[str, Any]]] = None,
    ground_truth: Optional[List[str]] = None,
) -> ToolExecutionDetails:
    if self._is_original_func_async:
        raise TypeError(
            f"Synchronous 'run_and_get_details()' called for an async tool '{self.name}'. "
            f"Use 'a_run_and_get_details()' instead."
        )
    return self._process_run_with_details(
        task=task,
        context_variables=context_variables,
        validation_prompt_addition=validation_prompt_addition,
        messages=messages,
        ground_truth=ground_truth,
    )

a_run_and_get_details `async` #

a_run_and_get_details(task, context_variables=None, validation_prompt_addition=None, messages=None, ground_truth=None)

Source code in autogen/tools/experimental/reliable/reliable.py

async def a_run_and_get_details(
    self,
    task: str,
    context_variables: Optional[ContextVariables] = None,
    validation_prompt_addition: Optional[str] = None,
    messages: Optional[list[dict[str, Any]]] = None,
    ground_truth: Optional[List[str]] = None,
) -> ToolExecutionDetails:
    if not self._is_original_func_async:
        warnings.warn(
            f"Running sync function '{self._original_func_name}' (wrapped by ReliableTool '{self.name}') "
            f"asynchronously using 'a_run_and_get_details()'. The underlying execution will be synchronous "
            f"within an executor.",
            UserWarning,
        )

    loop = asyncio.get_running_loop()
    try:
        func_call = functools.partial(
            self._process_run_with_details,
            task=task,
            context_variables=context_variables,
            validation_prompt_addition=validation_prompt_addition,
            messages=messages,
            ground_truth=ground_truth,
        )
        details: ToolExecutionDetails = await loop.run_in_executor(None, func_call)
        return details
    except Exception as e:
        logger.critical(
            "[%s] a_run_and_get_details encountered an unhandled exception from executor: %s",
            self.name,
            e,
            exc_info=True,
        )
        fallback_ctx = ReliableToolContext(task=task, reliable_tool_name=self.name)
        fallback_ctx.attempts.append(
            ExecutionAttempt(error=f"Unhandled executor/process error: {type(e).__name__}: {e}")
        )
        return ToolExecutionDetails(
            task=task,
            is_overall_successful=False,
            failure_reason=f"Critical unhandled exception during async execution: {type(e).__name__}: {e}",
            final_tool_context=fallback_ctx,
        )

ReliableTool

autogen.tools.experimental.ReliableTool #

name property #

description property #

func property #

tool_schema property #

function_schema property #

realtime_tool_schema property #

INTERNAL_TOOL_NAME_PREFIX class-attribute instance-attribute #

max_tool_invocations instance-attribute #

enable_dynamic_validation instance-attribute #

register_for_llm #

register_for_execution #

register_tool #

run #

a_run async #

run_and_get_details #

a_run_and_get_details async #

name `property` #

description `property` #

func `property` #

tool_schema `property` #

function_schema `property` #

realtime_tool_schema `property` #

INTERNAL_TOOL_NAME_PREFIX `class-attribute` `instance-attribute` #

max_tool_invocations `instance-attribute` #

enable_dynamic_validation `instance-attribute` #

a_run `async` #

a_run_and_get_details `async` #