Safetensors
English
llama
hamishivi commited on
Commit
8aa683c
1 Parent(s): 436b728

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -22,8 +22,8 @@ This is a 8B reward model used for PPO training trained on the UltraFeedback dat
22
  For more details, read the paper:
23
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
24
 
25
- **Built with Meta Llama 3.1!**
26
- Note that Llama 3.1 is released under the Meta Llama 3 community license, included here under `llama_3_license.txt`.
27
 
28
  ## Performance
29
 
 
22
  For more details, read the paper:
23
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
24
 
25
+ Note this model is finetuned from Llama 3.1, released under the Meta Llama 3.1 community license, included here under `llama_3_license.txt`.
26
+
27
 
28
  ## Performance
29