Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This model is trained with Iterative DPO in OpenRLHF
|
2 |
+
|
3 |
+
Datasets and Hyperparameters
|
4 |
+
|
5 |
+
```
|
6 |
+
Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
|
7 |
+
SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
|
8 |
+
Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
|
9 |
+
best_of_n: 2 (2 samples for each prompt)
|
10 |
+
Learning Rate: 5e-7
|
11 |
+
Beta: 0.1
|
12 |
+
Scheduler: Cosine with Warmup and MinLR
|
13 |
+
Rollout Batch Size: 20000
|
14 |
+
```
|