Standard-format-preference-dataset - a RLHFlow Collection

RLHFlow 's Collections

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Standard-format-preference-dataset

updated May 8, 2024

We collect the open-source datasets and process them into the standard format.

RLHFlow/UltraFeedback-preference-standard

Viewer • Updated Apr 27, 2024 • 340k • 82 • 9
RLHFlow/Helpsteer-preference-standard

Viewer • Updated Apr 27, 2024 • 37.1k • 54 • 4
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 50 • 1
RLHFlow/Orca-distibalel-standard

Viewer • Updated Apr 28, 2024 • 6.93k • 41 • 1
RLHFlow/Capybara-distibalel-Filter-standard

Viewer • Updated Apr 28, 2024 • 14.8k • 41
RLHFlow/CodeUltraFeedback-standard

Viewer • Updated Apr 27, 2024 • 50.2k • 49 • 5
RLHFlow/UltraInteract-filtered-standard

Viewer • Updated Apr 28, 2024 • 162k • 45 • 2
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 193 • 3
RLHFlow/Argilla-Math-DPO-standard

Viewer • Updated Apr 30, 2024 • 2.42k • 48 • 3
RLHFlow/Prometheus2-preference-standard

Viewer • Updated May 5, 2024 • 200k • 52 • 2
RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 34
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 41 • 2