Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ This is a model released for our paper: [REBEL: Reinforcement Learning via Regre
|
|
11 |
# REBEL-Llama-3-Armo-iter_3
|
12 |
|
13 |
This model is developed with REBEL based on [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the reward model and [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
|
14 |
-
The training code is available at https://github.com/ZhaolinGao/REBEL. We collect offline generations of the entire dataset with best-of-5 as the chosen response and worst-of-5 as the rejected response.
|
15 |
|
16 |
### Links to Other Model
|
17 |
|
|
|
11 |
# REBEL-Llama-3-Armo-iter_3
|
12 |
|
13 |
This model is developed with REBEL based on [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with [ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the reward model and [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
|
14 |
+
The training code is available at https://github.com/ZhaolinGao/REBEL. We collect offline generations of the entire dataset with best-of-5 as the chosen response and worst-of-5 as the rejected response ([Ultrafeedback-Llama-3-Armo-iter_3](https://huggingface.co/datasets/Cornell-AGI/Ultrafeedback-Llama-3-Armo-iter_3))..
|
15 |
|
16 |
### Links to Other Model
|
17 |
|