Skip to content

Test

The ag2 test command runs structured evaluations against your agents.

ag2 test eval#

Run YAML-defined test cases against an agent and check assertions on the output.

ag2 test eval <agent_file> --eval <eval_file> [--output <format>]

Eval File Format#

tests/eval.yaml
cases:
  - name: greeting
    input: "Say hello"
    assertions:
      - type: contains
        value: "hello"

  - name: math
    input: "What is 2 + 2?"
    assertions:
      - type: contains
        value: "4"
      - type: max_turns
        value: 3

Assertion Types#

Type Description
contains Output contains the given string
contains_all Output contains all given strings
contains_any Output contains at least one string
not_contains Output does not contain the string
regex Output matches a regex pattern
min_length / max_length Output length bounds
max_turns Conversation completed within N turns
max_cost Total cost stayed under budget
max_tokens Token usage stayed under limit
max_time Execution completed within time limit
tool_called A specific tool was invoked
tool_not_called A specific tool was not invoked
no_error No errors occurred during execution
llm_judge LLM-based evaluation with criteria and threshold

LLM Judge Example#

cases:
  - name: quality_check
    input: "Write a haiku about Python"
    assertions:
      - type: llm_judge
        criteria: "Is this a valid haiku with 5-7-5 syllable structure?"
        threshold: 0.8

Options#

Flag Description
--eval Path to YAML eval file
--output Output format: table (default), json, junit
--dry-run Show test cases without running them
--runs Run each case N times for determinism testing
--baseline Compare results against a baseline file for regression

Examples#

# Run evals and show results table
ag2 test eval my_agent.py --eval tests/eval.yaml

# Output as JSON
ag2 test eval my_agent.py --eval tests/eval.yaml --output json

# Dry run to preview cases
ag2 test eval my_agent.py --eval tests/eval.yaml --dry-run