ryota39/luke-japanese-base-lite-reward

工事中

Fine-tuning

this model was trained to classify whether input text comes from "chosen sentence" or "rejected sentence"
the probability (logits after passing softmax function) in last layer of this model can be used to quantify the preference from user input
fine-tuned studio-ousia/mluke-large-lite via full parameter tuning using open-preference-v0.3
trained on bf16 format

train loss	eval loss	accuracy	recall	precision	f1-score
0.1427	0.2009	9282	0.9383	0.9198	0.9290

accuracy	recall	precision	f1-score
0.9310	0.9199	0.9408	0.9302