OpenRLHF
/

Llama-3-8b-rm-mixture

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

chuyi777 commited on Jul 17, 2024

Commit

caf78aa

·

verified ·

1 Parent(s): 595c2d5

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku.
 ```
 Cosine Scheduler
 Learning Rate: 9e-6

 The Llama3-8b-based Reward Model was trained using OpenRLHF and a combination of datasets available at https://huggingface.co/datasets/OpenLLMAI/preference_dataset_mixture2_and_safe_pku.
+Base model: https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture
 ```
 Cosine Scheduler
 Learning Rate: 9e-6