LLM#

December 20, 2024
in LLM, GPT, research, tutorial
5 min read

ReasoningAgent Update - Beam Search, MCTS, and LATS for LLM Reasoning

Key Updates in this Release:

Configuration Changes
All reasoning parameters are now configured through a single reason_config dictionary
Breaking Change: Parameters like max_depth, beam_size, and answer_approach have moved from constructor arguments into reason_config
New Search Strategies
Added Monte Carlo Tree Search (MCTS) as an alternative to Beam Search
Introduced Language Agent Tree Search (LATS) - an enhancement to MCTS that incorporates reflection prior to the next round of simulation.
Enhanced Features
New forest_size parameter enables maintaining multiple independent reasoning trees
Support for ground truth answers in prompts to generate training data for LLM fine-tuning

Tree of Thoughts

Introduction

In our previous post, we introduced the ReasoningAgent, which utilized Beam Search for systematic reasoning. Today, we include MCTS (Monte Carlo Tree Search) and Language Agent Tree Search (LATS) as alternative search strategies, which present advantages in different scenarios.

Our previous ReasoningAgent draws inspiration from OpenAI's 2023 paper, Let's Verify Step by Step, as well as the 2024 O1 feature. The landscape of contemporary research is rich, with notable works such as DeepSeek-R1, Macro-O1, and OpenR.

December 20, 2024
in LLM, tools, langchain, crewai, pydanticai
12 min read

Cross-Framework LLM Tool Integration with AG2

TL;DR AG2 lets you bring in Tools from different frameworks like LangChain, CrewAI, and PydanticAI.

LangChain Tools: Useful for tasks like API querying and web scraping.
CrewAI Tools: Offers a variety of tools for web scraping, search, and more.
PydanticAI Tools: Adds context-driven tools and structured data processing.

December 2, 2024
in LLM, GPT, research
5 min read

ReasoningAgent - Tree of Thoughts with Beam Search in AG2

TL;DR: * We introduce ReasoningAgent, an AG2 agent that implements tree-of-thought reasoning with beam search to solve complex problems. * ReasoningAgent explores multiple reasoning paths in parallel and uses a grader agent to evaluate and select the most promising paths. * The exploration trajectory and thought tree can be saved locally for further analysis. These logs can even be saved as SFT dataset and preference dataset for DPO and PPO training.

Tree of Thoughts

Introduction

Large language models (LLMs) have shown impressive capabilities in various tasks, but they can still struggle with complex reasoning problems that require exploring multiple solution paths. To address this limitation, we introduce ReasoningAgent, an AG2 agent that implements tree-of-thought reasoning with beam search.

The key idea behind ReasoningAgent is to: 1. Generate multiple possible reasoning steps at each point 2. Evaluate these steps using a grader agent 3. Keep track of the most promising paths using beam search 4. Continue exploring those paths while pruning less promising ones

This approach allows the agent to systematically explore different reasoning strategies while managing computational resources efficiently.

November 27, 2024
in LLM, security
7 min read

Agentic testing for prompt leakage security

Prompt leakage social img

Introduction

As Large Language Models (LLMs) become increasingly integrated into production applications, ensuring their security has never been more crucial. One of the most pressing security concerns for these models is prompt injection, specifically prompt leakage.

LLMs often rely on system prompts (also known as system messages), which are internal instructions or guidelines that help shape their behavior and responses. These prompts can sometimes contain sensitive information, such as confidential details or internal logic, that should never be exposed to external users. However, with careful probing and targeted attacks, there is a risk that this sensitive information can be unintentionally revealed.

To address this issue, we have developed the Prompt Leakage Probing Framework, a tool designed to probe LLM agents for potential prompt leakage vulnerabilities. This framework serves as a proof of concept (PoC) for creating and testing various scenarios to evaluate how easily system prompts can be exposed. By automating the detection of such vulnerabilities, we aim to provide a powerful tool for testing the security of LLMs in real-world applications.

November 15, 2024
in LLM, GPT, AutoBuild
4 min read

Introducing CaptainAgent for Adaptive Team Building

TL;DR - We introduce CaptainAgent, an agent equipped with the capability to adaptively assemble a team of agents through retrieval-selection-generation process to handle complex tasks via the nested chat conversation pattern in AG2. - CaptainAgent supports all types of ConversableAgents implemented in AG2.

July 25, 2024
in LLM, Agent, Observability, AutoGen, AgentOps
5 min read

AgentOps, the Best Tool for AutoGen Agent Observability

AgentOps and AutoGen

TL;DR

AutoGen® offers detailed multi-agent observability with AgentOps.
AgentOps offers the best experience for developers building with AutoGen in just two lines of code.
Enterprises can now trust AutoGen in production with detailed monitoring and logging from AgentOps.

AutoGen is excited to announce an integration with AgentOps, the industry leader in agent observability and compliance. Back in February, Bloomberg declared 2024 the year of AI Agents. And it's true! We've seen AI transform from simplistic chatbots to autonomously making decisions and completing tasks on a user's behalf.

However, as with most new technologies, companies and engineering teams can be slow to develop processes and best practices. One part of the agent workflow we're betting on is the importance of observability. Letting your agents run wild might work for a hobby project, but if you're building enterprise-grade agents for production, it's crucial to understand where your agents are succeeding and failing. Observability isn't just an option; it's a requirement.

As agents evolve into even more powerful and complex tools, you should view them increasingly as tools designed to augment your team's capabilities. Agents will take on more prominent roles and responsibilities, take action, and provide immense value. However, this means you must monitor your agents the same way a good manager maintains visibility over their personnel. AgentOps offers developers observability for debugging and detecting failures. It provides the tools to monitor all the key metrics your agents use in one easy-to-read dashboard. Monitoring is more than just a “nice to have”; it's a critical component for any team looking to build and scale AI agents.

June 21, 2024
in LLM, GPT, evaluation, task utility
6 min read

AgentEval: A Developer Tool to Assess Utility of LLM-powered Applications

Fig.1: An AgentEval framework with verification step

Fig.1 illustrates the general flow of AgentEval with verification step

TL;DR: * As a developer, how can you assess the utility and effectiveness of an LLM-powered application in helping end users with their tasks? * To shed light on the question above, we previously introduced AgentEval — a framework to assess the multi-dimensional utility of any LLM-powered application crafted to assist users in specific tasks. We have now embedded it as part of the AutoGen library to ease developer adoption. * Here, we introduce an updated version of AgentEval that includes a verification process to estimate the robustness of the QuantifierAgent. More details can be found in this paper.

Introduction

Previously introduced AgentEval is a comprehensive framework designed to bridge the gap in assessing the utility of LLM-powered applications. It leverages recent advancements in LLMs to offer a scalable and cost-effective alternative to traditional human evaluations. The framework comprises three main agents: CriticAgent, QuantifierAgent, and VerifierAgent, each playing a crucial role in assessing the task utility of an application.

March 11, 2024
in LLM, GPT, research
7 min read

AutoDefense - Defend against jailbreak attacks with AutoGen

architecture

TL;DR

We propose AutoDefense, a multi-agent defense framework using AutoGen to protect LLMs from jailbreak attacks.
AutoDefense employs a response-filtering mechanism with specialized LLM agents collaborating to analyze potentially harmful responses.
Experiments show our three-agents (consisting of an intention analyzer, a prompt analyzer, and a judge) defense agency with LLaMA-2-13B effectively reduces jailbreak attack success rate while maintaining low false positives on normal user requests.

February 29, 2024
in LLM, research
6 min read

StateFlow - Build State-Driven Workflows with Customized Speaker Selection in GroupChat

TL;DR: Introduce Stateflow, a task-solving paradigm that conceptualizes complex task-solving processes backed by LLMs as state machines. Introduce how to use GroupChat to realize such an idea with a customized speaker selection function.

Introduction

It is a notable trend to use Large Language Models (LLMs) to tackle complex tasks, e.g., tasks that require a sequence of actions and dynamic interaction with tools and external environments. In this paper, we propose StateFlow, a novel LLM-based task-solving paradigm that conceptualizes complex task-solving processes as state machines. In StateFlow, we distinguish between "process grounding” (via state and state transitions) and "sub-task solving” (through actions within a state), enhancing control and interpretability of the task-solving procedure. A state represents the status of a running process. The transitions between states are controlled by heuristic rules or decisions made by the LLM, allowing for a dynamic and adaptive progression. Upon entering a state, a series of actions is executed, involving not only calling LLMs guided by different prompts, but also the utilization of external tools as needed.

December 23, 2023
in LLM, research
5 min read

AgentOptimizer - An Agentic Way to Train Your LLM Agent

Overall structure of AgentOptimizer

TL;DR: Introducing AgentOptimizer, a new class for training LLM agents in the era of LLMs as a service. AgentOptimizer is able to prompt LLMs to iteratively optimize function/skills of AutoGen agents according to the historical conversation and performance.

More information could be found in:

Paper: https://arxiv.org/abs/2402.11359.

Notebook: https://github.com/ag2ai/ag2/blob/main/notebook/agentchat_agentoptimizer.ipynb.

Introduction

In the traditional ML pipeline, we train a model by updating its weights according to the loss on the training set, while in the era of LLM agents, how should we train an agent? Here, we take an initial step towards the agent training. Inspired by the function calling capabilities provided by OpenAI, we draw an analogy between model weights and agent functions/skills, and update an agent’s functions/skills based on its historical performance on a training set. Specifically, we propose to use the function calling capabilities to formulate the actions that optimize the agents’ functions as a set of function calls, to support iteratively adding, revising, and removing existing functions. We also include two strategies, roll-back, and early-stop, to streamline the training process to overcome the performance-decreasing problem when training. As an agentic way of training an agent, our approach helps enhance the agents’ abilities without requiring access to the LLM's weights.