extract_rlhf_preference_dataset

extract_rlhf_preference_dataset(root: autogen.agentchat.contrib.reasoning_agent.ThinkNode, contrastive_threshold: float = 0.2) -> list[dict]

Extract and generate preference pairs for RLHF training by comparing sibling nodes.

Parameters:
NameDescription
rootThe root node of the tree.

Type: autogen.agentchat.contrib.reasoning_agent.ThinkNode
contrastive_thresholdbetween (0, 1), a distance measure that we are confident to call one is positive and another is negative.

Type: float

Default: 0.2
Returns:
TypeDescription
list[dict]List[Dict]: List of preference pairs, where each pair contains two responses and indicates which one is preferred.