We collect the open-source datasets and process them into the standard format.
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
models
6
RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
9.15k
•
76
RLHFlow/LLaMA3-iterative-DPO-final
Text Generation
•
Updated
•
2.51k
•
37
RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
Updated
•
5.99k
•
26
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
•
4.71k
•
5
RLHFlow/DPA-v1-Mistral-7B
Text Generation
•
Updated
•
26
•
2
RLHFlow/RewardModel-Mistral-7B-for-DPA-v1
Text Classification
•
Updated
•
51
datasets
29
RLHFlow/iterative-prompt-v1-iter9-20K
Viewer
•
Updated
•
19.9k
•
3
RLHFlow/iterative-prompt-v1-iter8-20K
Viewer
•
Updated
•
20k
•
4
RLHFlow/iterative-prompt-v1-iter7-20K
Viewer
•
Updated
•
20k
•
3
RLHFlow/iterative-prompt-v1-iter6-20K
Viewer
•
Updated
•
20k
•
3
RLHFlow/iterative-prompt-v1-iter5-20K
Viewer
•
Updated
•
20k
•
13
RLHFlow/iterative-prompt-v1-iter4-20K
Viewer
•
Updated
•
20k
•
11
RLHFlow/pair-preference-dataset-700K
Viewer
•
Updated
•
699k
•
829
•
1
RLHFlow/test_generation_2k
Viewer
•
Updated
•
2k
•
242
RLHFlow/SHP-standard
Viewer
•
Updated
•
93.3k
•
232
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard
Viewer
•
Updated
•
42.3k
•
230
•
2