AgentEval: A Developer Tool to Assess Utility of LLM-powered Applications
Deprecated
The agent_eval module is deprecated as of v0.12 and will be removed in v0.14. This blog post will also be removed in v0.14.

Fig.1 illustrates the general flow of AgentEval with verification step
TL;DR: * As a developer, how can you assess the utility and effectiveness of an LLM-powered application in helping end users with their tasks? * To shed light on the question above, we previously introduced AgentEval — a framework to assess the multi-dimensional utility of any LLM-powered application crafted to assist users in specific tasks. We have now embedded it as part of the AutoGen library to ease developer adoption. * Here, we introduce an updated version of AgentEval that includes a verification process to estimate the robustness of the QuantifierAgent. More details can be found in this paper.
Introduction
Previously introduced AgentEval is a comprehensive framework designed to bridge the gap in assessing the utility of LLM-powered applications. It leverages recent advancements in LLMs to offer a scalable and cost-effective alternative to traditional human evaluations. The framework comprises three main agents: CriticAgent, QuantifierAgent, and VerifierAgent, each playing a crucial role in assessing the task utility of an application.





