File size: 490 Bytes
0121069 fea0be1 7a433bb 0121069 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
This model is trained with Iterative DPO in OpenRLHF
Datasets and Hyperparameters
```
Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
best_of_n: 2 (2 samples for each prompt)
Learning Rate: 5e-7
Beta: 0.1
Scheduler: Cosine with Warmup and MinLR
Rollout Batch Size: 20000
Training Batch Size: 256
Number of Iterations: 9
``` |