Update README.md
Browse files
README.md
CHANGED
@@ -2,11 +2,11 @@ This model is trained with Iterative DPO in OpenRLHF
|
|
2 |
|
3 |
Datasets and Hyperparameters
|
4 |
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
9 |
|
|
|
10 |
Max Prompt Length: 2048
|
11 |
Max Response Length: 2048
|
12 |
best_of_n: 2 (2 samples for each prompt)
|
|
|
2 |
|
3 |
Datasets and Hyperparameters
|
4 |
|
5 |
+
- Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
|
6 |
+
- SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
|
7 |
+
- Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
|
|
8 |
|
9 |
+
```
|
10 |
Max Prompt Length: 2048
|
11 |
Max Response Length: 2048
|
12 |
best_of_n: 2 (2 samples for each prompt)
|