ryota39
/

mluke-large-lite-reward

Text Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

ryota39 commited on Jul 5, 2024

Commit

e7c4964

•

1 Parent(s): bd7427e

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -21,6 +21,8 @@ should probably proofread and complete it, then remove this comment. -->
 - the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
 - fine-tuned [studio-ousia/mluke-large-lite](https://huggingface.co/studio-ousia/mluke-large-lite) via full parameter tuning using [open-preference-v0.3](https://huggingface.co/datasets/ryota39/open_preference-v0.3)
 - trained on bf16 format
 ## Metric

 - the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
 - fine-tuned [studio-ousia/mluke-large-lite](https://huggingface.co/studio-ousia/mluke-large-lite) via full parameter tuning using [open-preference-v0.3](https://huggingface.co/datasets/ryota39/open_preference-v0.3)
 - trained on bf16 format
+- Label 0 stands for rejected sentence
+- Label 1 stands for chosen sentence
 ## Metric