ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning
Key Updates in this Release:
-
Configuration Changes
- All reasoning parameters are now configured through a single
reason_config
dictionary - Breaking Change: Parameters like
max_depth
,beam_size
, andanswer_approach
have moved from constructor arguments intoreason_config
- All reasoning parameters are now configured through a single
-
New Search Strategies
- Added Monte Carlo Tree Search (MCTS) as an alternative to Beam Search
- Introduced Language Agent Tree Search (LATS) - an enhancement to MCTS that incorporates reflection prior to the next round of simulation.
-
Enhanced Features
- New
forest_size
parameter enables maintaining multiple independent reasoning trees - Support for ground truth answers in prompts to generate training data for LLM fine-tuning
- New
Introduction
In our previous post, we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.
Our previous ReasoningAgent draws inspiration from OpenAI’s 2023 paper, Let’s Verify Step by Step, as well as the 2024 O1 feature. The landscape of contemporary research is rich, with notable works such as DeepSeek-R1, Macro-O1, and OpenR.
Quick Start Guide
Let’s start with a simple example using MCTS:
3. Configuring a Separate Grader Model
In addition to the main reasoning model, you can now specify a different model for the grader by using the grader_llm_config
parameter. This allows for more flexibility in evaluating the reasoning paths generated by the agent. If this parameter is not provided, the grader will use the same model as the reasoning agent.
Here’s how you can set it up:
Key Features in the New Version
1. Multiple Search Methods
ReasoningAgent now supports three search strategies:
As the previous blog, the default method is beam search.
MCTS is also included as a common approach.
It is important to note that our reasoning agent operates based on “process” and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach incorporate the reflection into prompt context before next round of simulation. You can define the agent using the LATS approach as follows.
2. Incorporating Ground Truth for Enhanced Training Data Synthesis
You can now include ground truth in your prompts to achieve more precise evaluations (grading). This allows you to leverage the reasoning agent to generate diverse thinking trajectories, further finetuning the base LLM.
3. Forest of Trees
Enable ensemble reasoning with multiple independent trees:
When to Use Each Method
Use Beam Search when:
- You want a deterministic search process
- You can reliably evaluate intermediate steps
- You need fast, memory-efficient search
- The solution space is relatively small and structured
- Early decisions strongly influence final outcomes
Use MCTS when:
- You need stochastic exploration of solution paths
- Final outcome evaluation is more reliable than intermediate steps
- The solution space is large or complex
- You want to balance exploration vs exploitation
- You have computational budget for multiple simulations
Use LATS when:
- Provides immediate reflection feedback before the next simulation
- Helps identify poor reasoning paths early for future improvement
- Especially useful for complex multi-step reasoning
Advanced Features
1. Visualization
Visualize the reasoning tree using graphviz:
2. Custom Evaluation
Modify the rating scale and evaluation criteria:
3. Save and Load Trees
Save reasoning trees for later analysis:
Performance Comparison
Variables
- d: Maximum depth of the reasoning tree
- b: Beam size (number of parallel paths maintained)
- w: Branching factor (number of child nodes per parent)
- n: Number of MCTS simulations
Time Complexity
Each algorithm has different computational costs:
- Beam Search: O(d × b × (w + 1))
- At each depth level d, evaluates w options for each of b beams
- Plus 1 for generating the options
- MCTS and LATS: O(n × d)
- Each simulation traverses down to depth d
- Performs n total simulations
Memory Usage
Storage requirements vary by approach:
- Beam Search: O(b × d)
- Fixed memory proportional to beam size and depth
- Only stores active beams
- MCTS and LATS: O(w^d)
- Worst case stores complete tree
- In practice much smaller due to selective expansion
Conclusion
The new ReasoningAgent offers a flexible toolkit for systematic reasoning with LLMs. Choose between Beam Search, MCTS, and LATS based on your specific needs regarding:
- Evaluation cost and availability
- Time and resource constraints
- Desired exploration vs exploitation balance
- Training data generation requirements
Next Steps
- Async Client Call: parallelize LLM calling to speed up searching
- Swarm Agent implementation
- Efficient Mode: merging thinker and grader
- Batch Norm: normalizing scores for MCTS
For Further Reading
- Original ReasoningAgent with Beam Search
- Documentation about ReasoningAgent
- MCTS in Wikipedia
- Example Notebook
Join our Discord server to discuss your experiences with these approaches and suggest improvements.