rasyosef
/

phi-1_5-dpo

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

rasyosef commited on Jul 25

Commit

2e6ea9b

•

1 Parent(s): 020f385

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-base_model: rasyosef/phi-1_5-sft-openhermes-v2
 library_name: peft
 license: mit
 tags:
@@ -9,6 +9,11 @@ tags:
 model-index:
 - name: phi-1_5-openhermesv2-dpo-combinedv3
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -16,7 +21,7 @@ should probably proofread and complete it, then remove this comment. -->
 # phi-1_5-openhermesv2-dpo-combinedv3
-This model is a fine-tuned version of [rasyosef/phi-1_5-sft-openhermes-v2](https://huggingface.co/rasyosef/phi-1_5-sft-openhermes-v2) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.5013
 - Rewards/chosen: -1.0250

 ---
+base_model: rasyosef/phi-1_5-sft
 library_name: peft
 license: mit
 tags:
 model-index:
 - name: phi-1_5-openhermesv2-dpo-combinedv3
   results: []
+datasets:
+- HuggingFaceH4/ultrafeedback_binarized
+- argilla/distilabel-intel-orca-dpo-pairs
+- jondurbin/py-dpo-v0.1
+- argilla/distilabel-math-preference-dpo
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # phi-1_5-openhermesv2-dpo-combinedv3
+This model is a fine-tuned version of [rasyosef/phi-1_5-sft](https://huggingface.co/rasyosef/phi-1_5-sft) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.5013
 - Rewards/chosen: -1.0250