pairwise_judge
autogen.beta.eval.scorers.pairwise_judge.pairwise_judge #
pairwise_judge(config, *, criterion, key, include_trace=False, include_reference=True, retries=1, swap=True, middleware=())
Build an LLM pairwise comparator for one criterion.
| PARAMETER | DESCRIPTION |
|---|---|
config | Judge model config (pin temperature 0; use a different model family than the variants to avoid self-preference bias). TYPE: |
criterion | The single standard to compare on, in plain English. TYPE: |
key | Result column this comparator reports under. TYPE: |
include_trace | Render each response's tool-call trajectory into the prompt. TYPE: |
include_reference | When TYPE: |
retries |
TYPE: |
swap | Run the dual-order position-swap (default, recommended). When TYPE: |
middleware | Middleware for the judge agent (e.g. TYPE: |