reasoning_agent
autogen.agentchat.contrib.reasoning_agent.extract_rlhf_preference_dataset
extract_rlhf_preference_dataset
Extract and generate preference pairs for RLHF training by comparing sibling nodes.
Name | Description |
---|---|
root | The root node of the tree. Type: ThinkNode |
contrastive_threshold | between (0, 1), a distance measure that we are confident to call one is positive and another is negative. Type: float Default: 0.2 |
Type | Description |
---|---|
list[dict[str, typing.Any]] | List[Dict]: List of preference pairs, where each pair contains two responses and indicates which one is preferred. |