File size: 6,598 Bytes
69c9d49 2eb1b8d 69c9d49 2eb1b8d 69c9d49 2eb1b8d 69c9d49 2eb1b8d 69c9d49 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
library_name: transformers
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- data/zephyr_uf_rlced_conifer_ref
model-index:
- name: zephyr-7b-uf-rlced-conifer-group-dpo-2e
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# zephyr-7b-uf-rlced-conifer-group-dpo-2e
This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the data/zephyr_uf_rlced_conifer_ref dataset.
It achieves the following results on the evaluation set:
- Loss: 0.2410
- Rewards/chosen: -3.4514
- Rewards/rejected: -8.7503
- Rewards/accuracies: 0.8778
- Rewards/margins: 5.2989
- Logps/rejected: -1278.7679
- Logps/chosen: -737.6100
- Logits/rejected: 3.0512
- Logits/chosen: 0.9415
- Alpha0: 0.1957
- Alpha1: 0.8043
- Task Loss1: 0.1724
- Task Excess Loss1: 0.0378
- Excess Loss: 0.0340
- Task Loss0: 0.5295
- Task Excess Loss0: 0.0879
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Alpha0 | Alpha1 | Task Loss1 | Task Excess Loss1 | Excess Loss | Task Loss0 | Task Excess Loss0 |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:------:|:------:|:----------:|:-----------------:|:-----------:|:----------:|:-----------------:|
| 0.3541 | 0.1388 | 100 | 0.4194 | -1.3743 | -2.6267 | 0.8102 | 1.2524 | -666.4093 | -529.9026 | -2.7580 | -2.7843 | 0.8214 | 0.1786 | 0.3373 | 0.1973 | 0.1899 | 0.6883 | 0.2655 |
| 0.2214 | 0.2776 | 200 | 0.3480 | -1.2450 | -2.9488 | 0.8412 | 1.7038 | -698.6146 | -516.9692 | 0.1216 | -0.2174 | 0.8786 | 0.1214 | 0.2866 | 0.1517 | 0.1250 | 0.5355 | 0.0929 |
| 0.2284 | 0.4164 | 300 | 0.3271 | -1.7298 | -3.6279 | 0.8515 | 1.8981 | -766.5247 | -565.4502 | 1.3769 | 0.5823 | 0.6417 | 0.3583 | 0.2721 | 0.1383 | 0.1130 | 0.5406 | 0.0794 |
| 0.1837 | 0.5552 | 400 | 0.3040 | -1.7232 | -4.0037 | 0.8553 | 2.2805 | -804.1021 | -564.7872 | 1.8300 | 0.7862 | 0.7891 | 0.2109 | 0.2517 | 0.1159 | 0.0949 | 0.5490 | 0.0796 |
| 0.1749 | 0.6940 | 500 | 0.2966 | -1.7976 | -4.1927 | 0.8637 | 2.3951 | -823.0039 | -572.2305 | 1.7164 | 0.5785 | 0.8057 | 0.1943 | 0.2448 | 0.1097 | 0.0856 | 0.5124 | 0.0570 |
| 0.1823 | 0.8328 | 600 | 0.3030 | -1.7187 | -3.9261 | 0.8647 | 2.2074 | -796.3432 | -564.3366 | 2.4921 | 1.3988 | 0.9053 | 0.0947 | 0.2541 | 0.1193 | 0.0922 | 0.5047 | 0.0596 |
| 0.1766 | 0.9715 | 700 | 0.2895 | -1.6400 | -4.2369 | 0.8647 | 2.5969 | -827.4293 | -556.4711 | 1.6749 | 0.1680 | 0.9622 | 0.0378 | 0.2417 | 0.1057 | 0.0812 | 0.5020 | 0.0532 |
| 0.1131 | 1.1103 | 800 | 0.2646 | -2.7794 | -6.7040 | 0.8647 | 3.9245 | -1074.1326 | -670.4117 | 2.3249 | 0.3844 | 0.0325 | 0.9675 | 0.1990 | 0.0653 | 0.0567 | 0.5372 | 0.0871 |
| 0.1006 | 1.2491 | 900 | 0.2490 | -3.6465 | -8.6692 | 0.8712 | 5.0227 | -1270.6554 | -757.1147 | 3.3211 | 1.0777 | 0.4760 | 0.5240 | 0.1852 | 0.0492 | 0.0420 | 0.5341 | 0.0967 |
| 0.0951 | 1.3879 | 1000 | 0.2470 | -3.0354 | -7.7369 | 0.8797 | 4.7015 | -1177.4214 | -696.0082 | 3.1614 | 0.9199 | 0.0150 | 0.9850 | 0.1756 | 0.0450 | 0.0382 | 0.5249 | 0.0834 |
| 0.0885 | 1.5267 | 1100 | 0.2435 | -3.4543 | -8.4740 | 0.8731 | 5.0197 | -1251.1321 | -737.8961 | 3.4589 | 1.3892 | 0.0151 | 0.9849 | 0.1747 | 0.0421 | 0.0368 | 0.5310 | 0.0887 |
| 0.1003 | 1.6655 | 1200 | 0.2416 | -3.3615 | -8.4285 | 0.875 | 5.0670 | -1246.5889 | -728.6184 | 2.9341 | 0.9100 | 0.0721 | 0.9279 | 0.1730 | 0.0396 | 0.0352 | 0.5285 | 0.0863 |
| 0.0865 | 1.8043 | 1300 | 0.2412 | -3.3114 | -8.4737 | 0.8769 | 5.1623 | -1251.1091 | -723.6140 | 2.9432 | 0.8628 | 0.0755 | 0.9245 | 0.1734 | 0.0388 | 0.0343 | 0.5272 | 0.0847 |
| 0.0893 | 1.9431 | 1400 | 0.2410 | -3.4515 | -8.7505 | 0.8769 | 5.2990 | -1278.7848 | -737.6204 | 3.0507 | 0.9407 | 0.6369 | 0.3631 | 0.1726 | 0.0379 | 0.0341 | 0.5306 | 0.0889 |
### Framework versions
- Transformers 4.44.1
- Pytorch 2.1.2+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
|