eurus-dpo-qlora-uf-ours-5e-6
This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:
- Loss: 6.1425
- Rewards/chosen: -23.7027
- Rewards/rejected: -32.8691
- Rewards/accuracies: 0.6260
- Rewards/margins: 9.1664
- Rewards/margins Max: 58.9042
- Rewards/margins Min: -33.2590
- Rewards/margins Std: 29.8583
- Logps/rejected: -3544.4312
- Logps/chosen: -2645.1541
- Logits/rejected: -0.9100
- Logits/chosen: -1.0759
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.4256 | 0.28 | 100 | 0.8163 | -1.8022 | -1.9583 | 0.5610 | 0.1561 | 2.2049 | -1.8191 | 1.3259 | -453.3455 | -455.0959 | -1.9771 | -2.0751 |
0.1591 | 0.56 | 200 | 1.2122 | -5.0976 | -6.6216 | 0.6050 | 1.5239 | 9.9971 | -4.8753 | 4.8268 | -919.6762 | -784.6454 | -1.3460 | -1.4469 |
0.1126 | 0.85 | 300 | 1.7230 | -6.1628 | -8.5878 | 0.6090 | 2.4250 | 18.9102 | -8.2202 | 8.7236 | -1116.3019 | -891.1599 | -1.2133 | -1.3142 |
0.074 | 1.13 | 400 | 2.0005 | -8.7127 | -11.9396 | 0.6220 | 3.2269 | 20.1537 | -9.9867 | 9.6878 | -1451.4778 | -1146.1495 | -1.3244 | -1.4370 |
0.0551 | 1.41 | 500 | 2.6568 | -10.4325 | -15.1571 | 0.6260 | 4.7246 | 28.6045 | -13.6975 | 13.8040 | -1773.2283 | -1318.1323 | -1.2958 | -1.4257 |
0.169 | 1.69 | 600 | 3.7089 | -14.9797 | -20.5965 | 0.6160 | 5.6168 | 36.0405 | -19.8931 | 18.0728 | -2317.1677 | -1772.8466 | -1.0370 | -1.1529 |
0.0661 | 1.97 | 700 | 4.1957 | -15.9319 | -22.6457 | 0.6220 | 6.7138 | 41.9072 | -22.6906 | 20.9609 | -2522.0879 | -1868.0721 | -1.1163 | -1.2633 |
0.0044 | 2.25 | 800 | 5.9108 | -22.7617 | -31.4584 | 0.6230 | 8.6967 | 56.6380 | -31.9336 | 28.6036 | -3403.3569 | -2551.0461 | -0.9371 | -1.0936 |
0.011 | 2.54 | 900 | 5.9213 | -23.0839 | -32.0567 | 0.6230 | 8.9728 | 56.9548 | -32.0980 | 28.8598 | -3463.1873 | -2583.2671 | -0.9208 | -1.0846 |
0.0138 | 2.82 | 1000 | 6.0584 | -23.3438 | -32.4235 | 0.6280 | 9.0798 | 58.3224 | -32.8664 | 29.5381 | -3499.8743 | -2609.2573 | -0.9160 | -1.0810 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 2
Model tree for just1nseo/eurus-dpo-qlora-uf-ours-5e-6
Base model
openbmb/Eurus-7b-sft