chuyi777 commited on
Commit
70715a4
1 Parent(s): 9bf875b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -2,11 +2,11 @@ This model is trained with Iterative DPO in OpenRLHF
2
 
3
  Datasets and Hyperparameters
4
 
5
- ```
6
- Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
7
- SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
8
- Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
9
 
 
10
  Max Prompt Length: 2048
11
  Max Response Length: 2048
12
  best_of_n: 2 (2 samples for each prompt)
 
2
 
3
  Datasets and Hyperparameters
4
 
5
+ - Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
6
+ - SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
7
+ - Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
 
8
 
9
+ ```
10
  Max Prompt Length: 2048
11
  Max Response Length: 2048
12
  best_of_n: 2 (2 samples for each prompt)