Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

	@@ -1 +1,9 @@
1	- The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku

+The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku.
+```
+Cosine Scheduler
+Learning Rate: 9e-6
+Warmup Ratio: 0.03
+Batch Size: 256
+Epoch: 1
+```