dpo_with_se
This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6194
- Rewards/chosen: -0.6699
- Rewards/rejected: -1.1107
- Rewards/accuracies: 0.6458
- Rewards/margins: 0.4407
- Logps/rejected: -422.9081
- Logps/chosen: -458.9963
- Logits/rejected: 0.0509
- Logits/chosen: 0.1892
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- lr_scheduler_warmup_steps: 100
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7121 | 0.0622 | 50 | 0.7078 | 1.9859 | 1.9118 | 0.5694 | 0.0741 | -392.6837 | -432.4385 | 0.1883 | 0.3317 |
0.672 | 0.1244 | 100 | 0.6718 | 0.4213 | 0.2008 | 0.5972 | 0.2204 | -409.7933 | -448.0844 | 0.1330 | 0.2722 |
0.6803 | 0.1866 | 150 | 0.6633 | 1.2004 | 0.9074 | 0.6215 | 0.2930 | -402.7275 | -440.2932 | 0.2565 | 0.3917 |
0.6816 | 0.2488 | 200 | 0.6535 | -0.2285 | -0.4811 | 0.5938 | 0.2526 | -416.6123 | -454.5817 | 0.1335 | 0.2706 |
0.6719 | 0.3109 | 250 | 0.6768 | -0.0803 | -0.2830 | 0.6007 | 0.2027 | -414.6320 | -453.1003 | 0.1071 | 0.2455 |
0.642 | 0.3731 | 300 | 0.6402 | 0.3405 | 0.0226 | 0.6146 | 0.3179 | -411.5756 | -448.8922 | 0.0864 | 0.2271 |
0.6675 | 0.4353 | 350 | 0.6472 | 0.7586 | 0.4677 | 0.6007 | 0.2909 | -407.1244 | -444.7109 | 0.1382 | 0.2779 |
0.6581 | 0.4975 | 400 | 0.6502 | -0.0310 | -0.3059 | 0.6181 | 0.2749 | -414.8607 | -452.6067 | 0.0326 | 0.1770 |
0.6155 | 0.5597 | 450 | 0.6416 | 0.0254 | -0.2895 | 0.625 | 0.3149 | -414.6964 | -452.0428 | 0.1102 | 0.2490 |
0.6438 | 0.6219 | 500 | 0.6383 | -0.2805 | -0.6002 | 0.625 | 0.3197 | -417.8031 | -455.1015 | 0.0799 | 0.2196 |
0.6069 | 0.6841 | 550 | 0.6360 | -0.6526 | -0.9456 | 0.6007 | 0.2930 | -421.2573 | -458.8233 | 0.1079 | 0.2462 |
0.6227 | 0.7463 | 600 | 0.6349 | -0.0705 | -0.3659 | 0.6215 | 0.2954 | -415.4609 | -453.0020 | 0.0381 | 0.1807 |
0.6473 | 0.8085 | 650 | 0.6331 | -0.3187 | -0.6771 | 0.6528 | 0.3584 | -418.5728 | -455.4844 | 0.1406 | 0.2776 |
0.6259 | 0.8706 | 700 | 0.6295 | -0.4256 | -0.7399 | 0.6111 | 0.3143 | -419.2006 | -456.5528 | 0.0986 | 0.2391 |
0.6572 | 0.9328 | 750 | 0.6389 | -0.5969 | -0.8936 | 0.6007 | 0.2967 | -420.7374 | -458.2657 | 0.0726 | 0.2120 |
0.63 | 0.9950 | 800 | 0.6310 | -0.2243 | -0.5516 | 0.6285 | 0.3274 | -417.3179 | -454.5398 | 0.1026 | 0.2406 |
0.4431 | 1.0572 | 850 | 0.6238 | -0.3325 | -0.7169 | 0.6632 | 0.3844 | -418.9702 | -455.6217 | 0.0604 | 0.1992 |
0.47 | 1.1194 | 900 | 0.6286 | -0.6589 | -1.1143 | 0.6597 | 0.4554 | -422.9441 | -458.8861 | -0.0269 | 0.1154 |
0.4436 | 1.1816 | 950 | 0.6252 | -0.6243 | -1.0270 | 0.6354 | 0.4027 | -422.0717 | -458.5404 | 0.0062 | 0.1465 |
0.4483 | 1.2438 | 1000 | 0.6238 | -0.6325 | -1.0514 | 0.6319 | 0.4189 | -422.3156 | -458.6222 | 0.0434 | 0.1813 |
0.4568 | 1.3060 | 1050 | 0.6297 | -0.9557 | -1.3457 | 0.6285 | 0.3900 | -425.2583 | -461.8539 | 0.1563 | 0.2901 |
0.4555 | 1.3682 | 1100 | 0.6311 | -0.5825 | -1.0012 | 0.6319 | 0.4188 | -421.8140 | -458.1216 | 0.0905 | 0.2271 |
0.4744 | 1.4303 | 1150 | 0.6248 | -0.5365 | -0.9374 | 0.6424 | 0.4008 | -421.1751 | -457.6623 | 0.0472 | 0.1861 |
0.4245 | 1.4925 | 1200 | 0.6255 | -0.6457 | -1.0579 | 0.6424 | 0.4122 | -422.3806 | -458.7540 | -0.0423 | 0.0997 |
0.4767 | 1.5547 | 1250 | 0.6294 | -0.7333 | -1.1519 | 0.6319 | 0.4185 | -423.3202 | -459.6304 | 0.1300 | 0.2652 |
0.4714 | 1.6169 | 1300 | 0.6253 | -0.8128 | -1.2388 | 0.6493 | 0.4261 | -424.1896 | -460.4245 | 0.0397 | 0.1788 |
0.4336 | 1.6791 | 1350 | 0.6229 | -0.7654 | -1.2064 | 0.6424 | 0.4410 | -423.8654 | -459.9506 | 0.1234 | 0.2587 |
0.4791 | 1.7413 | 1400 | 0.6216 | -0.7578 | -1.2069 | 0.6389 | 0.4492 | -423.8710 | -459.8747 | 0.0547 | 0.1931 |
0.439 | 1.8035 | 1450 | 0.6204 | -0.7469 | -1.1972 | 0.6493 | 0.4502 | -423.7731 | -459.7664 | 0.0661 | 0.2040 |
0.4419 | 1.8657 | 1500 | 0.6194 | -0.6699 | -1.1107 | 0.6458 | 0.4407 | -422.9081 | -458.9963 | 0.0509 | 0.1892 |
0.4593 | 1.9279 | 1550 | 0.6214 | -0.6895 | -1.1228 | 0.6528 | 0.4333 | -423.0291 | -459.1917 | 0.0628 | 0.2005 |
0.4444 | 1.9900 | 1600 | 0.6229 | -0.6827 | -1.1246 | 0.6667 | 0.4419 | -423.0472 | -459.1237 | 0.0863 | 0.2226 |
Framework versions
- PEFT 0.11.2.dev0
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for ernestoBocini/Phi3-mini-DPO-Tuned
Base model
microsoft/Phi-3-mini-4k-instruct