Update README.md
Browse files
README.md
CHANGED
@@ -1 +1,9 @@
|
|
1 |
-
The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku.
|
2 |
+
|
3 |
+
```
|
4 |
+
Cosine Scheduler
|
5 |
+
Learning Rate: 9e-6
|
6 |
+
Warmup Ratio: 0.03
|
7 |
+
Batch Size: 256
|
8 |
+
Epoch: 1
|
9 |
+
```
|