allenai
/

llama-3.1-tulu-2-8b-uf-mean-rm

Model card Files Files and versions Community

hamishivi commited on Aug 12

Commit

8aa683c

•

1 Parent(s): 436b728

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -22,8 +22,8 @@ This is a 8B reward model used for PPO training trained on the UltraFeedback dat
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
-**Built with Meta Llama 3.1!**
-Note that Llama 3.1 is released under the Meta Llama 3 community license, included here under `llama_3_license.txt`.
 ## Performance

 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
+Note this model is finetuned from Llama 3.1, released under the Meta Llama 3.1 community license, included here under `llama_3_license.txt`.
 ## Performance