ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies
Use ReasoningAgent for o1 style reasoning in Agentic workflows with LLMs using AG2
Introduction
The ReasoningAgent
is designed to enhance language models’ reasoning
capabilities through systematic exploration of thought processes. By
implementing the Tree of Thoughts (ToT) framework, it enables LLMs like
GPT-4 and Llama to break down complex problems into manageable steps and
explore multiple solution paths simultaneously.
This notebook demonstrates the key features and capabilities of the
ReasoningAgent
, showing how it can effectively reason about problems
even when using smaller models like gpt-4o-mini
.
Search Strategies
The ReasoningAgent
supports multiple search strategies for exploring
the reasoning space:
1. Beam Search (Default)
- Maintains the top
k
most promising paths at each step - Efficient for problems with clear evaluation criteria
- Configurable beam width to balance exploration vs computation
- Special case: DFS mode (beam size = 1) for linear reasoning similar to Chain-of-Thought
2. Monte Carlo Tree Search (MCTS)
- Balances exploration and exploitation using UCT formula
- Particularly effective for problems with delayed rewards
- Stochastic exploration helps avoid local optima
- Configurable number of simulations and exploration constant
3. Language Agent Tree Search (LATS)
- Provides immediate reflection feedback before the next simulation
- Helps identify poor reasoning paths early for future improvement
- Especially useful for complex multi-step reasoning
Core Components
- Thinker Agent: Generates potential next steps in the reasoning process
- Grader Agent: Evaluates the quality of each reasoning step
- Tree Structure: Organizes thoughts hierarchically for systematic exploration
- Visualization Tools: Built-in Graphviz support for analyzing reasoning paths
- Logging Features: Log and save thinking trajectories to finetune the language model
Configuration Options
The agent is highly configurable through a single reason_config
dictionary:
Chain-of-Thought Reasoning with DFS
The simplest form of tree-based reasoning uses depth-first search (DFS)
to explore a single path, similar to OpenAI’s O1 feature. By setting
method="dfs"
in the reason_config, the agent will: 1. Generate one
reasoning step at a time 2. Follow that single path until reaching a
conclusion 3. Never explore alternative branches
Note: The effectiveness depends on the underlying model’s training. Models not specifically trained for step-by-step reasoning may show limited improvement with this approach.
Beam Search in Tree of Thought
Beam Search is a powerful technique used in tree-based reasoning that
allows the agent to explore multiple paths simultaneously. By setting
beam_size
greater than 1, the agent can maintain several candidate
solutions at each step, evaluating them based on their potential to lead
to the best final answer. This method is particularly effective when the
solution space is large and complex, as it balances exploration and
exploitation, ensuring that promising paths are prioritized while still
considering alternative options.
In this approach, the agent generates multiple reasoning steps in parallel, allowing it to compare different trajectories and select the most promising ones for further exploration. This can lead to more robust and accurate conclusions, especially in scenarios where intermediate evaluations are critical to the final outcome.
MCTS
This section demonstrates how to use Monte Carlo Tree Search (MCTS) with ReasoningAgent for complex reasoning tasks. MCTS provides several advantages over beam search when:
- Ground truth evaluation is available
- LLM-based evaluation is expensive
- You want to generate diverse, high-quality training data
LATS
It is important to note that our reasoning agent operates based on “process” and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach incorporate the reflection into prompt context before next round of simulation. You can define the agent using the LATS approach as follows.
Visualizing the Reasoning Tree
Installation of Graphviz
To visualize the reasoning tree, you need to install Graphviz. Please
note that using pip install
may not be sufficient for all operating
systems. In some cases, you might need to manually download and install
Graphviz.
pip install graphviz
To save the visualization as “tree_of_thoughts.png”, run the following command:
Utilizing ReasoningAgent for Nested Chat Interactions
In this example, we will explore how the ReasoningAgent can be employed to facilitate nested chat interactions, specifically for writing a blog post about NVIDIA. The agent will engage in a structured dialogue to enhance the quality of the content through iterative feedback and reasoning.
Task: Writing a Blog Post on NVIDIA
The goal is to generate a concise yet engaging blog post about NVIDIA.
The process involves one turn (for simplicity) of conversation where the
agent reflects on the content, reasons about improvements, and
incorporates user feedback. You can update the max_turns
parameter to
execute multiple times.
WARNING: It may take a long time to run this example (up to 10 minutes).
Use a different Model for Grading
To use a different model for grading instead of gpt-4o, pass the
grader_llm_config
argument when initializing the ReasoningAgent
.
This ensures that the grading of trajectories is performed using the
specified configuration from the config_list
, separate from the main
llm_config
.
Save data to future training
In this section, we will focus on saving the reasoning agent’s decision-making data to help future training. By capturing the structure and content of the reasoning tree, we can create a valuable dataset that can be used to enhance the agent’s learning process. This data will allow us to analyze the agent’s reasoning patterns, improve its performance, and refine its ability to generate high-quality responses. The saved data can be utilized for various training methodologies, including supervised fine-tuning and reinforcement learning, ultimately contributing to the development of a more robust and effective reasoning agent.
Utilizing Ground Truth to Enhance Training Data Generation
Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore: - The process of incorporating ground truth into prompts - The methods by which the agent leverages ground truth for evaluation
Forest of Thoughts
The concept of a “Forest of Thoughts” allows us to leverage bootstrapping techniques to execute the tree of thoughts multiple times, creating a diverse set of answers. After running these independent reasoning processes, we can aggregate them to form our final answer.