weqweasdas
/

hh_rlhf_rm_open_llama_3b

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

weqweasdas commited on Feb 25, 2024

Commit

06bd94a

·

verified ·

1 Parent(s): e3c1d3f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@
 <!-- Provide a quick summary of what the model is/does. -->
-Thanks for your intersts in this reward model! We recommed you to use [weqweasdas/RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B) instead.
 In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).

 <!-- Provide a quick summary of what the model is/does. -->
+**Thanks for your intersts in this reward model! We recommed you to use [weqweasdas/RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B) instead.**
 In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).