--- license: apache-2.0 base_model: alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2 tags: - trl - dpo - generated_from_trainer model-index: - name: TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo results: [] --- # TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo This model is a fine-tuned version of [alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2](https://huggingface.co/alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.1772 - Rewards/chosen: -0.9390 - Rewards/rejected: -4.1141 - Rewards/accuracies: 0.8385 - Rewards/margins: 3.1750 - Logps/rejected: -327.8484 - Logps/chosen: -280.3031 - Logits/rejected: -2.7526 - Logits/chosen: -2.6271 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6892 | 0.06 | 100 | 0.6904 | -0.0007 | -0.0068 | 0.4692 | 0.0061 | -286.7757 | -270.9199 | -2.7940 | -2.6576 | | 0.6767 | 0.13 | 200 | 0.6754 | -0.0060 | -0.0430 | 0.6385 | 0.0370 | -287.1373 | -270.9724 | -2.7931 | -2.6568 | | 0.6493 | 0.19 | 300 | 0.6431 | -0.0105 | -0.1151 | 0.7885 | 0.1046 | -287.8588 | -271.0174 | -2.7922 | -2.6561 | | 0.5809 | 0.25 | 400 | 0.5879 | -0.0345 | -0.2649 | 0.8308 | 0.2304 | -289.3571 | -271.2578 | -2.7893 | -2.6534 | | 0.4994 | 0.32 | 500 | 0.5043 | -0.0774 | -0.5296 | 0.8385 | 0.4522 | -292.0042 | -271.6873 | -2.7851 | -2.6499 | | 0.4093 | 0.38 | 600 | 0.4360 | -0.1267 | -0.8043 | 0.8385 | 0.6776 | -294.7504 | -272.1800 | -2.7820 | -2.6476 | | 0.3951 | 0.44 | 700 | 0.3844 | -0.1731 | -1.0600 | 0.8423 | 0.8870 | -297.3079 | -272.6434 | -2.7796 | -2.6459 | | 0.3307 | 0.51 | 800 | 0.3413 | -0.2208 | -1.3252 | 0.8346 | 1.1044 | -299.9597 | -273.1208 | -2.7764 | -2.6434 | | 0.3035 | 0.57 | 900 | 0.3095 | -0.2914 | -1.5963 | 0.8308 | 1.3049 | -302.6710 | -273.8272 | -2.7734 | -2.6410 | | 0.2565 | 0.63 | 1000 | 0.2856 | -0.3318 | -1.8163 | 0.8385 | 1.4845 | -304.8706 | -274.2305 | -2.7712 | -2.6397 | | 0.2409 | 0.7 | 1100 | 0.2676 | -0.3754 | -2.0199 | 0.8385 | 1.6445 | -306.9071 | -274.6673 | -2.7691 | -2.6380 | | 0.2341 | 0.76 | 1200 | 0.2515 | -0.4233 | -2.2275 | 0.8385 | 1.8042 | -308.9832 | -275.1463 | -2.7675 | -2.6371 | | 0.2584 | 0.82 | 1300 | 0.2393 | -0.4799 | -2.4301 | 0.8385 | 1.9501 | -311.0082 | -275.7123 | -2.7653 | -2.6355 | | 0.2171 | 0.89 | 1400 | 0.2294 | -0.5274 | -2.6087 | 0.8385 | 2.0812 | -312.7944 | -276.1873 | -2.7635 | -2.6342 | | 0.1638 | 0.95 | 1500 | 0.2206 | -0.5748 | -2.7894 | 0.8385 | 2.2146 | -314.6021 | -276.6611 | -2.7623 | -2.6336 | | 0.2334 | 1.02 | 1600 | 0.2147 | -0.6108 | -2.9348 | 0.8385 | 2.3240 | -316.0559 | -277.0210 | -2.7603 | -2.6319 | | 0.2178 | 1.08 | 1700 | 0.2086 | -0.6523 | -3.0743 | 0.8385 | 2.4220 | -317.4505 | -277.4355 | -2.7597 | -2.6314 | | 0.1704 | 1.14 | 1800 | 0.2037 | -0.6819 | -3.1955 | 0.8385 | 2.5136 | -318.6626 | -277.7317 | -2.7590 | -2.6309 | | 0.1683 | 1.21 | 1900 | 0.1996 | -0.7152 | -3.3176 | 0.8385 | 2.6024 | -319.8835 | -278.0646 | -2.7587 | -2.6313 | | 0.271 | 1.27 | 2000 | 0.1959 | -0.7447 | -3.4272 | 0.8385 | 2.6825 | -320.9794 | -278.3595 | -2.7576 | -2.6305 | | 0.127 | 1.33 | 2100 | 0.1930 | -0.7665 | -3.5137 | 0.8385 | 2.7472 | -321.8449 | -278.5782 | -2.7571 | -2.6302 | | 0.2107 | 1.4 | 2200 | 0.1905 | -0.7830 | -3.5883 | 0.8385 | 2.8053 | -322.5906 | -278.7429 | -2.7572 | -2.6305 | | 0.1977 | 1.46 | 2300 | 0.1883 | -0.7986 | -3.6574 | 0.8385 | 2.8588 | -323.2822 | -278.8991 | -2.7566 | -2.6300 | | 0.1655 | 1.52 | 2400 | 0.1872 | -0.8203 | -3.7149 | 0.8385 | 2.8946 | -323.8572 | -279.1161 | -2.7553 | -2.6289 | | 0.1776 | 1.59 | 2500 | 0.1850 | -0.8439 | -3.7881 | 0.8385 | 2.9442 | -324.5885 | -279.3518 | -2.7548 | -2.6285 | | 0.1372 | 1.65 | 2600 | 0.1850 | -0.8548 | -3.8280 | 0.8385 | 2.9732 | -324.9880 | -279.4609 | -2.7544 | -2.6282 | | 0.15 | 1.71 | 2700 | 0.1836 | -0.8734 | -3.8792 | 0.8385 | 3.0059 | -325.5001 | -279.6465 | -2.7543 | -2.6283 | | 0.1338 | 1.78 | 2800 | 0.1823 | -0.8736 | -3.9132 | 0.8385 | 3.0396 | -325.8393 | -279.6486 | -2.7541 | -2.6282 | | 0.1507 | 1.84 | 2900 | 0.1811 | -0.8932 | -3.9558 | 0.8385 | 3.0626 | -326.2653 | -279.8444 | -2.7533 | -2.6273 | | 0.1615 | 1.9 | 3000 | 0.1811 | -0.8986 | -3.9790 | 0.8385 | 3.0804 | -326.4981 | -279.8992 | -2.7533 | -2.6275 | | 0.1656 | 1.97 | 3100 | 0.1800 | -0.9039 | -4.0052 | 0.8385 | 3.1012 | -326.7594 | -279.9523 | -2.7528 | -2.6270 | | 0.1398 | 2.03 | 3200 | 0.1797 | -0.9123 | -4.0258 | 0.8385 | 3.1135 | -326.9660 | -280.0360 | -2.7534 | -2.6278 | | 0.1929 | 2.09 | 3300 | 0.1792 | -0.9098 | -4.0380 | 0.8385 | 3.1282 | -327.0879 | -280.0112 | -2.7524 | -2.6269 | | 0.1616 | 2.16 | 3400 | 0.1787 | -0.9249 | -4.0622 | 0.8385 | 3.1374 | -327.3301 | -280.1616 | -2.7519 | -2.6263 | | 0.1664 | 2.22 | 3500 | 0.1790 | -0.9246 | -4.0716 | 0.8385 | 3.1470 | -327.4239 | -280.1592 | -2.7524 | -2.6269 | | 0.2085 | 2.28 | 3600 | 0.1787 | -0.9301 | -4.0835 | 0.8385 | 3.1534 | -327.5426 | -280.2136 | -2.7532 | -2.6279 | | 0.1565 | 2.35 | 3700 | 0.1782 | -0.9301 | -4.0909 | 0.8385 | 3.1608 | -327.6164 | -280.2137 | -2.7521 | -2.6265 | | 0.153 | 2.41 | 3800 | 0.1778 | -0.9281 | -4.0947 | 0.8385 | 3.1666 | -327.6550 | -280.1937 | -2.7522 | -2.6268 | | 0.1787 | 2.47 | 3900 | 0.1783 | -0.9319 | -4.0918 | 0.8385 | 3.1599 | -327.6259 | -280.2316 | -2.7520 | -2.6266 | | 0.172 | 2.54 | 4000 | 0.1780 | -0.9338 | -4.1035 | 0.8385 | 3.1697 | -327.7429 | -280.2505 | -2.7526 | -2.6273 | | 0.2643 | 2.6 | 4100 | 0.1771 | -0.9229 | -4.0969 | 0.8385 | 3.1739 | -327.6764 | -280.1422 | -2.7521 | -2.6267 | | 0.1619 | 2.66 | 4200 | 0.1776 | -0.9326 | -4.1083 | 0.8385 | 3.1757 | -327.7909 | -280.2390 | -2.7523 | -2.6270 | | 0.2413 | 2.73 | 4300 | 0.1778 | -0.9292 | -4.1024 | 0.8385 | 3.1732 | -327.7315 | -280.2050 | -2.7529 | -2.6277 | | 0.1187 | 2.79 | 4400 | 0.1778 | -0.9343 | -4.1068 | 0.8385 | 3.1725 | -327.7758 | -280.2554 | -2.7521 | -2.6267 | | 0.1439 | 2.86 | 4500 | 0.1776 | -0.9368 | -4.1118 | 0.8385 | 3.1750 | -327.8253 | -280.2808 | -2.7517 | -2.6263 | | 0.1116 | 2.92 | 4600 | 0.1773 | -0.9302 | -4.1079 | 0.8385 | 3.1777 | -327.7867 | -280.2152 | -2.7526 | -2.6272 | | 0.18 | 2.98 | 4700 | 0.1772 | -0.9290 | -4.1048 | 0.8385 | 3.1758 | -327.7554 | -280.2029 | -2.7526 | -2.6271 | ### Framework versions - Transformers 4.36.2 - Pytorch 2.1.0+cu118 - Datasets 2.14.6 - Tokenizers 0.15.0