extract_rlhf_preference_dataset

extract_rlhf_preference_dataset(root: ThinkNode, contrastive_threshold: float = 0.2) -> list[dict[str, Any]]

Extract and generate preference pairs for RLHF training by comparing sibling nodes.

Parameters:
NameDescription
rootThe root node of the tree.

Type: ThinkNode
contrastive_thresholdbetween (0, 1), a distance measure that we are confident to call one is positive and another is negative.

Type: float

Default: 0.2
Returns:
TypeDescription
list[dict[str, typing.Any]]List[Dict]: List of preference pairs, where each pair contains two responses and indicates which one is preferred.