--- library_name: transformers license: apache-2.0 base_model: alignment-handbook/zephyr-7b-sft-full tags: - alignment-handbook - trl - dpo - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - data/zephyr_uf_rlced_conifer_ref model-index: - name: zephyr-7b-uf-rlced-conifer-group-dpo-2e results: [] --- # zephyr-7b-uf-rlced-conifer-group-dpo-2e This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the data/zephyr_uf_rlced_conifer_ref dataset. It achieves the following results on the evaluation set: - Loss: 0.2410 - Rewards/chosen: -3.4514 - Rewards/rejected: -8.7503 - Rewards/accuracies: 0.8778 - Rewards/margins: 5.2989 - Logps/rejected: -1278.7679 - Logps/chosen: -737.6100 - Logits/rejected: 3.0512 - Logits/chosen: 0.9415 - Alpha0: 0.1957 - Alpha1: 0.8043 - Task Loss1: 0.1724 - Task Excess Loss1: 0.0378 - Excess Loss: 0.0340 - Task Loss0: 0.5295 - Task Excess Loss0: 0.0879 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-07 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 256 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Alpha0 | Alpha1 | Task Loss1 | Task Excess Loss1 | Excess Loss | Task Loss0 | Task Excess Loss0 | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:------:|:------:|:----------:|:-----------------:|:-----------:|:----------:|:-----------------:| | 0.3541 | 0.1388 | 100 | 0.4194 | -1.3743 | -2.6267 | 0.8102 | 1.2524 | -666.4093 | -529.9026 | -2.7580 | -2.7843 | 0.8214 | 0.1786 | 0.3373 | 0.1973 | 0.1899 | 0.6883 | 0.2655 | | 0.2214 | 0.2776 | 200 | 0.3480 | -1.2450 | -2.9488 | 0.8412 | 1.7038 | -698.6146 | -516.9692 | 0.1216 | -0.2174 | 0.8786 | 0.1214 | 0.2866 | 0.1517 | 0.1250 | 0.5355 | 0.0929 | | 0.2284 | 0.4164 | 300 | 0.3271 | -1.7298 | -3.6279 | 0.8515 | 1.8981 | -766.5247 | -565.4502 | 1.3769 | 0.5823 | 0.6417 | 0.3583 | 0.2721 | 0.1383 | 0.1130 | 0.5406 | 0.0794 | | 0.1837 | 0.5552 | 400 | 0.3040 | -1.7232 | -4.0037 | 0.8553 | 2.2805 | -804.1021 | -564.7872 | 1.8300 | 0.7862 | 0.7891 | 0.2109 | 0.2517 | 0.1159 | 0.0949 | 0.5490 | 0.0796 | | 0.1749 | 0.6940 | 500 | 0.2966 | -1.7976 | -4.1927 | 0.8637 | 2.3951 | -823.0039 | -572.2305 | 1.7164 | 0.5785 | 0.8057 | 0.1943 | 0.2448 | 0.1097 | 0.0856 | 0.5124 | 0.0570 | | 0.1823 | 0.8328 | 600 | 0.3030 | -1.7187 | -3.9261 | 0.8647 | 2.2074 | -796.3432 | -564.3366 | 2.4921 | 1.3988 | 0.9053 | 0.0947 | 0.2541 | 0.1193 | 0.0922 | 0.5047 | 0.0596 | | 0.1766 | 0.9715 | 700 | 0.2895 | -1.6400 | -4.2369 | 0.8647 | 2.5969 | -827.4293 | -556.4711 | 1.6749 | 0.1680 | 0.9622 | 0.0378 | 0.2417 | 0.1057 | 0.0812 | 0.5020 | 0.0532 | | 0.1131 | 1.1103 | 800 | 0.2646 | -2.7794 | -6.7040 | 0.8647 | 3.9245 | -1074.1326 | -670.4117 | 2.3249 | 0.3844 | 0.0325 | 0.9675 | 0.1990 | 0.0653 | 0.0567 | 0.5372 | 0.0871 | | 0.1006 | 1.2491 | 900 | 0.2490 | -3.6465 | -8.6692 | 0.8712 | 5.0227 | -1270.6554 | -757.1147 | 3.3211 | 1.0777 | 0.4760 | 0.5240 | 0.1852 | 0.0492 | 0.0420 | 0.5341 | 0.0967 | | 0.0951 | 1.3879 | 1000 | 0.2470 | -3.0354 | -7.7369 | 0.8797 | 4.7015 | -1177.4214 | -696.0082 | 3.1614 | 0.9199 | 0.0150 | 0.9850 | 0.1756 | 0.0450 | 0.0382 | 0.5249 | 0.0834 | | 0.0885 | 1.5267 | 1100 | 0.2435 | -3.4543 | -8.4740 | 0.8731 | 5.0197 | -1251.1321 | -737.8961 | 3.4589 | 1.3892 | 0.0151 | 0.9849 | 0.1747 | 0.0421 | 0.0368 | 0.5310 | 0.0887 | | 0.1003 | 1.6655 | 1200 | 0.2416 | -3.3615 | -8.4285 | 0.875 | 5.0670 | -1246.5889 | -728.6184 | 2.9341 | 0.9100 | 0.0721 | 0.9279 | 0.1730 | 0.0396 | 0.0352 | 0.5285 | 0.0863 | | 0.0865 | 1.8043 | 1300 | 0.2412 | -3.3114 | -8.4737 | 0.8769 | 5.1623 | -1251.1091 | -723.6140 | 2.9432 | 0.8628 | 0.0755 | 0.9245 | 0.1734 | 0.0388 | 0.0343 | 0.5272 | 0.0847 | | 0.0893 | 1.9431 | 1400 | 0.2410 | -3.4515 | -8.7505 | 0.8769 | 5.2990 | -1278.7848 | -737.6204 | 3.0507 | 0.9407 | 0.6369 | 0.3631 | 0.1726 | 0.0379 | 0.0341 | 0.5306 | 0.0889 | ### Framework versions - Transformers 4.44.1 - Pytorch 2.1.2+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1