evaluate_traces
autogen.beta.eval.runtime.evaluate.evaluate_traces async #
evaluate_traces(source, *, scorers, store_dir, suite=None, budgets=None, concurrency=4, run_id=None, label=None, stream=None)
Grade every trace from source and persist a :class:RunResult.
| PARAMETER | DESCRIPTION |
|---|---|
source | Where the traces come from (in-memory, disk, or cloud). TYPE: |
scorers | Scorer instances; each runs once per trace. |
store_dir | Directory the run JSON is written to as |
suite | Optional dataset to join traces to by TYPE: |
budgets | Optional observational thresholds; violations are recorded, never aborting. TYPE: |
concurrency | Max traces graded in parallel. TYPE: |
run_id | Override for the auto-generated run id. TYPE: |
label | Optional user-defined identifier recorded on the run — meant to be shared across runs of the same eval so they can be grouped and trended. TYPE: |
stream | Optional :class: TYPE: |