OpenRLHF
/

Llama-3-8b-iter-dpo-179k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jul 13

Commit

0121069

•

1 Parent(s): 022e0c6

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+This model is trained with Iterative DPO in OpenRLHF
+Datasets and Hyperparameters
+```
+Reward Model:https://huggingface.co/OpenLLMAI/Llama-3-8b-rm-700k
+SFT Model: https://huggingface.co/OpenLLMAI/Llama-3-8b-sft-mixture
+Prompt Dataset: https://huggingface.co/datasets/OpenLLMAI/prompt-collection-v0.1
+best_of_n: 2 (2 samples for each prompt)
+Learning Rate: 5e-7
+Beta: 0.1
+Scheduler: Cosine with Warmup and MinLR
+Rollout Batch Size: 20000
+```