Edit model card

zephyr-dpop-qlora-uf-ours-uffull-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF and the generation/UFfull2 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6950
  • Positive Losses: 0.5820
  • Dpo Losses: 0.6380
  • Rewards/chosen: 0.2290
  • Rewards/rejected: 0.0996
  • Rewards/accuracies: 0.7060
  • Rewards/margins: 0.1294
  • Rewards/margins Max: 0.5134
  • Rewards/margins Min: -0.1814
  • Rewards/margins Std: 0.2328
  • Logps/rejected: -255.8980
  • Logps/chosen: -261.5583
  • Logits/rejected: -2.6096
  • Logits/chosen: -2.6435

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Positive Losses Dpo Losses Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6915 0.02 100 0.6917 0.0059 0.6910 0.0266 0.0222 0.6170 0.0043 0.0246 -0.0134 0.0126 -263.6297 -281.7968 -2.7663 -2.8014
0.6797 0.05 200 0.6897 0.0702 0.6800 0.0886 0.0608 0.6570 0.0278 0.1378 -0.0648 0.0675 -259.7737 -275.5939 -2.7413 -2.7759
0.6804 0.07 300 0.6845 0.0848 0.6724 0.1325 0.0877 0.6675 0.0448 0.2086 -0.0924 0.1004 -257.0813 -271.2012 -2.7504 -2.7853
0.6951 0.1 400 0.6829 0.1179 0.6671 0.1575 0.1005 0.6715 0.0570 0.2589 -0.1125 0.1237 -255.7986 -268.7028 -2.6989 -2.7337
0.6599 0.12 500 0.6868 0.1747 0.6620 0.1717 0.1030 0.6805 0.0688 0.2913 -0.1240 0.1393 -255.5571 -267.2820 -2.6656 -2.7019
0.6899 0.14 600 0.6773 0.1322 0.6631 0.1930 0.1265 0.6805 0.0665 0.2912 -0.1245 0.1385 -253.2036 -265.1512 -2.6976 -2.7346
0.6596 0.17 700 0.6841 0.2476 0.6579 0.1952 0.1160 0.6790 0.0792 0.3399 -0.1420 0.1603 -254.2511 -264.9378 -2.6481 -2.6842
0.6618 0.19 800 0.7055 0.6819 0.6582 0.1938 0.1128 0.6725 0.0810 0.3642 -0.1653 0.1763 -254.5748 -265.0780 -2.6749 -2.7097
0.6742 0.22 900 0.7031 0.6125 0.6568 0.1979 0.1141 0.6810 0.0839 0.3706 -0.1651 0.1783 -254.4471 -264.6613 -2.6218 -2.6566
0.6751 0.24 1000 0.7010 0.6677 0.6601 0.2068 0.1295 0.6755 0.0773 0.3517 -0.1632 0.1718 -252.9047 -263.7737 -2.6192 -2.6553
0.7098 0.26 1100 0.7131 0.8234 0.6548 0.1971 0.1068 0.6775 0.0903 0.3961 -0.1800 0.1920 -255.1729 -264.7435 -2.6144 -2.6518
0.6678 0.29 1200 0.7126 0.8054 0.6533 0.2007 0.1068 0.6810 0.0938 0.4066 -0.1769 0.1949 -255.1695 -264.3879 -2.5888 -2.6260
0.6611 0.31 1300 0.7072 0.7968 0.6584 0.2114 0.1291 0.6725 0.0823 0.3729 -0.1733 0.1825 -252.9392 -263.3107 -2.5893 -2.6265
0.6852 0.34 1400 0.7117 0.8828 0.6578 0.2125 0.1283 0.6865 0.0842 0.3801 -0.1702 0.1839 -253.0243 -263.2099 -2.5908 -2.6269
0.7148 0.36 1500 0.7147 0.8994 0.6537 0.2082 0.1146 0.6775 0.0936 0.4107 -0.1826 0.1980 -254.3940 -263.6350 -2.5606 -2.5971
0.734 0.38 1600 0.7263 0.9562 0.6467 0.1975 0.0887 0.7005 0.1088 0.4496 -0.1881 0.2128 -256.9880 -264.7073 -2.5414 -2.5748
0.68 0.41 1700 0.6886 0.4934 0.6531 0.2201 0.1281 0.6895 0.0920 0.3890 -0.1655 0.1858 -253.0398 -262.4442 -2.6144 -2.6469
0.9221 0.43 1800 0.6972 0.5938 0.6479 0.2127 0.1083 0.6855 0.1044 0.4219 -0.1737 0.2001 -255.0207 -263.1860 -2.6572 -2.6883
0.6965 0.45 1900 0.7029 0.5493 0.6415 0.2047 0.0857 0.6980 0.1190 0.4554 -0.1734 0.2113 -257.2836 -263.9902 -2.6385 -2.6680
0.6754 0.48 2000 0.6736 0.2085 0.6476 0.2262 0.1217 0.6960 0.1045 0.4193 -0.1652 0.1960 -253.6813 -261.8383 -2.6573 -2.6879
0.6527 0.5 2100 0.6734 0.1901 0.6479 0.2309 0.1262 0.6940 0.1046 0.4298 -0.1721 0.2013 -253.2316 -261.3691 -2.6274 -2.6587
0.6693 0.53 2200 0.6811 0.3594 0.6470 0.2250 0.1186 0.6885 0.1064 0.4311 -0.1714 0.2022 -253.9932 -261.9567 -2.6328 -2.6644
0.6652 0.55 2300 0.6946 0.5078 0.6431 0.2178 0.1017 0.6895 0.1161 0.4629 -0.1818 0.2158 -255.6816 -262.6781 -2.6122 -2.6429
0.6511 0.57 2400 0.6755 0.2132 0.6463 0.2309 0.1228 0.6960 0.1081 0.4351 -0.1715 0.2030 -253.5698 -261.3663 -2.6075 -2.6392
0.6512 0.6 2500 0.7102 0.5940 0.6370 0.2139 0.0822 0.6990 0.1318 0.5141 -0.1918 0.2364 -257.6378 -263.0636 -2.6184 -2.6519
0.7342 0.62 2600 0.6884 0.3826 0.6413 0.2233 0.1023 0.7040 0.1210 0.4842 -0.1791 0.2219 -255.6233 -262.1221 -2.6165 -2.6506
0.6754 0.65 2700 0.6847 0.3415 0.6419 0.2283 0.1092 0.7055 0.1192 0.4752 -0.1765 0.2181 -254.9368 -261.6212 -2.6158 -2.6511
0.7445 0.67 2800 0.6769 0.2621 0.6445 0.2313 0.1188 0.7020 0.1125 0.4532 -0.1690 0.2084 -253.9747 -261.3299 -2.6176 -2.6513
0.6656 0.69 2900 0.6867 0.4407 0.6412 0.2299 0.1090 0.7045 0.1208 0.4813 -0.1757 0.2199 -254.9489 -261.4680 -2.6212 -2.6566
0.6641 0.72 3000 0.6918 0.5290 0.6395 0.2278 0.1026 0.7025 0.1252 0.4930 -0.1780 0.2250 -255.5911 -261.6767 -2.6344 -2.6687
0.6752 0.74 3100 0.6963 0.6115 0.6398 0.2272 0.1021 0.7030 0.1252 0.5000 -0.1806 0.2279 -255.6473 -261.7339 -2.6282 -2.6628
0.6417 0.77 3200 0.7057 0.7185 0.6364 0.2246 0.0908 0.7040 0.1338 0.5276 -0.1863 0.2394 -256.7738 -261.9981 -2.6277 -2.6619
0.6436 0.79 3300 0.7146 0.8124 0.6342 0.2203 0.0808 0.7040 0.1395 0.5452 -0.1905 0.2463 -257.7732 -262.4228 -2.6190 -2.6530
0.7092 0.81 3400 0.6972 0.6209 0.6389 0.2266 0.0993 0.7015 0.1273 0.5073 -0.1826 0.2310 -255.9223 -261.7928 -2.6091 -2.6431
0.6491 0.84 3500 0.6972 0.6241 0.6390 0.2273 0.1003 0.7020 0.1270 0.5062 -0.1824 0.2306 -255.8255 -261.7234 -2.6038 -2.6383
0.6879 0.86 3600 0.7091 0.7585 0.6353 0.2220 0.0856 0.7060 0.1364 0.5352 -0.1870 0.2418 -257.2982 -262.2594 -2.6103 -2.6440
0.6129 0.89 3700 0.7033 0.6942 0.6366 0.2255 0.0924 0.7065 0.1331 0.5254 -0.1849 0.2379 -256.6156 -261.9067 -2.6075 -2.6417
0.6578 0.91 3800 0.6956 0.5982 0.6385 0.2286 0.1002 0.7040 0.1284 0.5109 -0.1818 0.2321 -255.8333 -261.5916 -2.6073 -2.6413
0.6535 0.93 3900 0.6949 0.5854 0.6383 0.2289 0.1000 0.7045 0.1288 0.5118 -0.1813 0.2323 -255.8504 -261.5681 -2.6069 -2.6411
0.6876 0.96 4000 0.6951 0.5831 0.6380 0.2289 0.0994 0.7035 0.1295 0.5141 -0.1813 0.2330 -255.9116 -261.5652 -2.6055 -2.6398
0.6531 0.98 4100 0.6952 0.5853 0.6381 0.2289 0.0995 0.7040 0.1294 0.5136 -0.1815 0.2329 -255.9032 -261.5644 -2.6099 -2.6438

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpop-qlora-uf-ours-uffull-5e-6

Adapter
(137)
this model