Skip to content

Task

autogen.beta.eval.dataset.task.Task dataclass #

Task(task_id, inputs, reference_outputs=None, tags=(), metadata=dict())

A single task in an evaluation suite.

Tasks are typically loaded from JSONL via :meth:Suite.from_jsonl or built inline via :meth:Suite.from_list. The runner passes inputs["input"] to agent.ask(...); every other field is plumbed through to scorers unchanged.

PARAMETER DESCRIPTION
task_id

Stable identifier for this task. Auto-generated as "task-{index:04d}" by Suite.from_* when the source dict omits one.

TYPE: str

inputs

The task's input payload. Must contain at least an "input" key — that string is the user prompt the agent is asked.

TYPE: dict[str, Any]

reference_outputs

Expected outputs, consumed by reference-based scorers (e.g. final_answer_matches). A dict; a Pydantic model or dataclass (e.g. a response_schema instance) is accepted and coerced to a dict. None for tasks scored reference-free.

TYPE: dict[str, Any] | None DEFAULT: None

tags

Free-form labels, useful for filtering or slicing ("happy-path", "adversarial").

TYPE: tuple[str, ...] DEFAULT: ()

metadata

Anything else the dataset carries — surfaces in the run JSON so scorers and reports can consume it.

TYPE: dict[str, Any] DEFAULT: dict()

task_id instance-attribute #

task_id

inputs instance-attribute #

inputs

reference_outputs class-attribute instance-attribute #

reference_outputs = None

tags class-attribute instance-attribute #

tags = ()

metadata class-attribute instance-attribute #

metadata = field(default_factory=dict)