Skip to content

Task

autogen.beta.eval.dataset.task.Task `dataclass` #

Task(task_id, inputs, reference_outputs=None, tags=(), metadata=dict())

A single task in an evaluation suite.

Tasks are typically loaded from JSONL via :meth:Suite.from_jsonl or built inline via :meth:Suite.from_list. The runner passes inputs["input"] to agent.ask(...); every other field is plumbed through to scorers unchanged.

PARAMETER	DESCRIPTION
`task_id`	Stable identifier for this task. Auto-generated as `"task-{index:04d}"` by `Suite.from_` when the source dict omits one. TYPE:* `str`
`inputs`	The task's input payload. Must contain at least an `"input"` key — that string is the user prompt the agent is asked. TYPE: `dict[str, Any]`
`reference_outputs`	Expected outputs, consumed by reference-based scorers (e.g. `final_answer_matches`). A dict; a Pydantic model or dataclass (e.g. a `response_schema` instance) is accepted and coerced to a dict. `None` for tasks scored reference-free. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`tags`	Free-form labels, useful for filtering or slicing (`"happy-path"`, `"adversarial"`). TYPE: `tuple[str, ...]` DEFAULT: `()`
`metadata`	Anything else the dataset carries — surfaces in the run JSON so scorers and reports can consume it. TYPE: `dict[str, Any]` DEFAULT: `dict()`

task_id `instance-attribute` #

task_id

inputs `instance-attribute` #

inputs

reference_outputs `class-attribute` `instance-attribute` #

reference_outputs = None

tags `class-attribute` `instance-attribute` #

tags = ()

metadata `class-attribute` `instance-attribute` #

metadata = field(default_factory=dict)