chuyi777 commited on
Commit
0121069
1 Parent(s): 022e0c6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This model is trained with Iterative DPO in OpenRLHF
2
+
3
+ Datasets and Hyperparameters
4
+
5
+ ```
6
+ Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
7
+ SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
8
+ Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
9
+ best_of_n: 2 (2 samples for each prompt)
10
+ Learning Rate: 5e-7
11
+ Beta: 0.1
12
+ Scheduler: Cosine with Warmup and MinLR
13
+ Rollout Batch Size: 20000
14
+ ```