OpenRLHF
/

Llama-3-8b-iter-dpo-179k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jul 14

Commit

70715a4

•

1 Parent(s): 9bf875b

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -2,11 +2,11 @@ This model is trained with Iterative DPO in OpenRLHF
 Datasets and Hyperparameters
-```
-Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
-SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
-Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
 Max Prompt Length: 2048
 Max Response Length: 2048
 best_of_n: 2 (2 samples for each prompt)

 Datasets and Hyperparameters
+- Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
+- SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
+- Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
+```
 Max Prompt Length: 2048
 Max Response Length: 2048
 best_of_n: 2 (2 samples for each prompt)