RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification
•
Updated
•
10.8k
•
157
Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/)
Note Bradley-Terry reward model trained with RLHFlow codebase
Note Tech report that covers Pairwise Preference Model
Note Tech report for ArmoRM