NicholasCorrado's picture
End of training
2eb1b8d verified
metadata
library_name: transformers
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - data/zephyr_uf_rlced_conifer_ref
model-index:
  - name: zephyr-7b-uf-rlced-conifer-group-dpo-2e
    results: []

zephyr-7b-uf-rlced-conifer-group-dpo-2e

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the data/zephyr_uf_rlced_conifer_ref dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2410
  • Rewards/chosen: -3.4514
  • Rewards/rejected: -8.7503
  • Rewards/accuracies: 0.8778
  • Rewards/margins: 5.2989
  • Logps/rejected: -1278.7679
  • Logps/chosen: -737.6100
  • Logits/rejected: 3.0512
  • Logits/chosen: 0.9415
  • Alpha0: 0.1957
  • Alpha1: 0.8043
  • Task Loss1: 0.1724
  • Task Excess Loss1: 0.0378
  • Excess Loss: 0.0340
  • Task Loss0: 0.5295
  • Task Excess Loss0: 0.0879

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 256
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Alpha0 Alpha1 Task Loss1 Task Excess Loss1 Excess Loss Task Loss0 Task Excess Loss0
0.3541 0.1388 100 0.4194 -1.3743 -2.6267 0.8102 1.2524 -666.4093 -529.9026 -2.7580 -2.7843 0.8214 0.1786 0.3373 0.1973 0.1899 0.6883 0.2655
0.2214 0.2776 200 0.3480 -1.2450 -2.9488 0.8412 1.7038 -698.6146 -516.9692 0.1216 -0.2174 0.8786 0.1214 0.2866 0.1517 0.1250 0.5355 0.0929
0.2284 0.4164 300 0.3271 -1.7298 -3.6279 0.8515 1.8981 -766.5247 -565.4502 1.3769 0.5823 0.6417 0.3583 0.2721 0.1383 0.1130 0.5406 0.0794
0.1837 0.5552 400 0.3040 -1.7232 -4.0037 0.8553 2.2805 -804.1021 -564.7872 1.8300 0.7862 0.7891 0.2109 0.2517 0.1159 0.0949 0.5490 0.0796
0.1749 0.6940 500 0.2966 -1.7976 -4.1927 0.8637 2.3951 -823.0039 -572.2305 1.7164 0.5785 0.8057 0.1943 0.2448 0.1097 0.0856 0.5124 0.0570
0.1823 0.8328 600 0.3030 -1.7187 -3.9261 0.8647 2.2074 -796.3432 -564.3366 2.4921 1.3988 0.9053 0.0947 0.2541 0.1193 0.0922 0.5047 0.0596
0.1766 0.9715 700 0.2895 -1.6400 -4.2369 0.8647 2.5969 -827.4293 -556.4711 1.6749 0.1680 0.9622 0.0378 0.2417 0.1057 0.0812 0.5020 0.0532
0.1131 1.1103 800 0.2646 -2.7794 -6.7040 0.8647 3.9245 -1074.1326 -670.4117 2.3249 0.3844 0.0325 0.9675 0.1990 0.0653 0.0567 0.5372 0.0871
0.1006 1.2491 900 0.2490 -3.6465 -8.6692 0.8712 5.0227 -1270.6554 -757.1147 3.3211 1.0777 0.4760 0.5240 0.1852 0.0492 0.0420 0.5341 0.0967
0.0951 1.3879 1000 0.2470 -3.0354 -7.7369 0.8797 4.7015 -1177.4214 -696.0082 3.1614 0.9199 0.0150 0.9850 0.1756 0.0450 0.0382 0.5249 0.0834
0.0885 1.5267 1100 0.2435 -3.4543 -8.4740 0.8731 5.0197 -1251.1321 -737.8961 3.4589 1.3892 0.0151 0.9849 0.1747 0.0421 0.0368 0.5310 0.0887
0.1003 1.6655 1200 0.2416 -3.3615 -8.4285 0.875 5.0670 -1246.5889 -728.6184 2.9341 0.9100 0.0721 0.9279 0.1730 0.0396 0.0352 0.5285 0.0863
0.0865 1.8043 1300 0.2412 -3.3114 -8.4737 0.8769 5.1623 -1251.1091 -723.6140 2.9432 0.8628 0.0755 0.9245 0.1734 0.0388 0.0343 0.5272 0.0847
0.0893 1.9431 1400 0.2410 -3.4515 -8.7505 0.8769 5.2990 -1278.7848 -737.6204 3.0507 0.9407 0.6369 0.3631 0.1726 0.0379 0.0341 0.5306 0.0889

Framework versions

  • Transformers 4.44.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1