--- base_model: rasyosef/phi-1_5-sft library_name: peft license: mit tags: - trl - dpo - generated_from_trainer model-index: - name: phi-1_5-dpo results: [] datasets: - HuggingFaceH4/ultrafeedback_binarized - argilla/distilabel-intel-orca-dpo-pairs - jondurbin/py-dpo-v0.1 - argilla/distilabel-math-preference-dpo --- # phi-1_5-dpo This model is a fine-tuned version of [rasyosef/phi-1_5-sft](https://huggingface.co/rasyosef/phi-1_5-sft) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.5013 - Rewards/chosen: -1.0250 - Rewards/rejected: -2.3893 - Rewards/accuracies: 0.7283 - Rewards/margins: 1.3643 - Logps/rejected: -162.0916 - Logps/chosen: -128.1033 - Logits/rejected: 5.3082 - Logits/chosen: 5.1890 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 300 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6899 | 0.1241 | 138 | 0.6769 | -0.0153 | -0.0504 | 0.625 | 0.0351 | -138.7025 | -118.0066 | 4.5710 | 4.4532 | | 0.6309 | 0.2482 | 276 | 0.6035 | -0.2012 | -0.5586 | 0.7120 | 0.3575 | -143.7850 | -119.8655 | 4.5167 | 4.3940 | | 0.5756 | 0.3723 | 414 | 0.5669 | -0.3693 | -0.9842 | 0.7174 | 0.6149 | -148.0405 | -121.5467 | 4.6242 | 4.5060 | | 0.5715 | 0.4964 | 552 | 0.5446 | -0.4109 | -1.1855 | 0.7283 | 0.7745 | -150.0534 | -121.9633 | 4.7324 | 4.6143 | | 0.5449 | 0.6205 | 690 | 0.5331 | -0.4666 | -1.3090 | 0.7446 | 0.8424 | -151.2884 | -122.5196 | 4.8229 | 4.7080 | | 0.5536 | 0.7446 | 828 | 0.5136 | -0.4885 | -1.3825 | 0.7446 | 0.8940 | -152.0234 | -122.7389 | 4.8867 | 4.7737 | | 0.5253 | 0.8687 | 966 | 0.5057 | -0.5613 | -1.5446 | 0.7554 | 0.9832 | -153.6442 | -123.4672 | 4.9287 | 4.8080 | | 0.5249 | 0.9928 | 1104 | 0.5054 | -0.5101 | -1.4656 | 0.75 | 0.9555 | -152.8544 | -122.9549 | 4.8704 | 4.7521 | | 0.4631 | 1.1169 | 1242 | 0.5067 | -0.6889 | -1.7678 | 0.75 | 1.0789 | -155.8768 | -124.7426 | 4.8470 | 4.7276 | | 0.4524 | 1.2410 | 1380 | 0.5006 | -0.7467 | -1.9049 | 0.7446 | 1.1582 | -157.2474 | -125.3205 | 4.9447 | 4.8239 | | 0.424 | 1.3651 | 1518 | 0.5036 | -0.7638 | -2.0144 | 0.7337 | 1.2505 | -158.3425 | -125.4923 | 4.9235 | 4.8002 | | 0.4428 | 1.4892 | 1656 | 0.5004 | -0.7790 | -2.0132 | 0.7446 | 1.2342 | -158.3307 | -125.6437 | 4.9576 | 4.8375 | | 0.4424 | 1.6133 | 1794 | 0.4944 | -0.8220 | -2.0517 | 0.7391 | 1.2297 | -158.7152 | -126.0739 | 4.9736 | 4.8553 | | 0.4358 | 1.7374 | 1932 | 0.5022 | -0.8091 | -1.9993 | 0.7228 | 1.1902 | -158.1918 | -125.9447 | 5.0894 | 4.9702 | | 0.4426 | 1.8615 | 2070 | 0.4992 | -0.8254 | -2.0308 | 0.7228 | 1.2054 | -158.5065 | -126.1077 | 5.0943 | 4.9780 | | 0.4226 | 1.9856 | 2208 | 0.4971 | -0.8701 | -2.1434 | 0.7283 | 1.2733 | -159.6329 | -126.5553 | 5.1222 | 5.0011 | | 0.3684 | 2.1097 | 2346 | 0.5032 | -0.9201 | -2.2281 | 0.7228 | 1.3081 | -160.4799 | -127.0545 | 5.2209 | 5.1031 | | 0.3695 | 2.2338 | 2484 | 0.5022 | -0.9332 | -2.2651 | 0.7228 | 1.3319 | -160.8495 | -127.1860 | 5.2170 | 5.0977 | | 0.3693 | 2.3579 | 2622 | 0.5022 | -0.9418 | -2.2839 | 0.7283 | 1.3421 | -161.0379 | -127.2717 | 5.2390 | 5.1169 | | 0.3659 | 2.4820 | 2760 | 0.5037 | -0.9820 | -2.3392 | 0.7228 | 1.3572 | -161.5908 | -127.6742 | 5.2392 | 5.1148 | | 0.3557 | 2.6061 | 2898 | 0.5031 | -1.0001 | -2.3531 | 0.7228 | 1.3529 | -161.7294 | -127.8552 | 5.2704 | 5.1488 | | 0.3491 | 2.7302 | 3036 | 0.5053 | -1.0242 | -2.3803 | 0.7228 | 1.3562 | -162.0017 | -128.0954 | 5.2880 | 5.1693 | | 0.3512 | 2.8543 | 3174 | 0.5036 | -1.0265 | -2.3833 | 0.7174 | 1.3568 | -162.0320 | -128.1190 | 5.2965 | 5.1768 | | 0.3458 | 2.9784 | 3312 | 0.5013 | -1.0250 | -2.3893 | 0.7283 | 1.3643 | -162.0916 | -128.1033 | 5.3082 | 5.1890 | ### Framework versions - PEFT 0.11.1 - Transformers 4.42.4 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1