File size: 5,769 Bytes
b36409d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e5_rate_03_beta_DPO
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# v1_1000_STEPS_1e5_rate_03_beta_DPO
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0612
- Rewards/chosen: -22.4821
- Rewards/rejected: -21.9166
- Rewards/accuracies: 0.4198
- Rewards/margins: -0.5655
- Logps/rejected: -89.9348
- Logps/chosen: -90.1933
- Logits/rejected: -4.4171
- Logits/chosen: -4.4169
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.025 | 0.05 | 50 | 2.0989 | -9.2701 | -9.3262 | 0.4418 | 0.0561 | -47.9670 | -46.1535 | -4.0702 | -4.0700 |
| 3.1266 | 0.1 | 100 | 3.2379 | -16.6921 | -16.6056 | 0.4637 | -0.0864 | -72.2316 | -70.8932 | -3.1523 | -3.1523 |
| 2.9672 | 0.15 | 150 | 2.9589 | -15.0108 | -14.8189 | 0.4571 | -0.1919 | -66.2757 | -65.2890 | -4.5807 | -4.5807 |
| 3.7281 | 0.2 | 200 | 2.9926 | -15.2425 | -14.9338 | 0.4462 | -0.3087 | -66.6590 | -66.0614 | -4.9577 | -4.9577 |
| 2.825 | 0.24 | 250 | 2.9153 | -14.7019 | -14.3934 | 0.4505 | -0.3085 | -64.8577 | -64.2594 | -5.0246 | -5.0246 |
| 3.9813 | 0.29 | 300 | 2.9308 | -14.8129 | -14.5166 | 0.4352 | -0.2962 | -65.2682 | -64.6292 | -4.5446 | -4.5446 |
| 3.9125 | 0.34 | 350 | 2.9798 | -15.2390 | -14.9581 | 0.4418 | -0.2809 | -66.7398 | -66.0496 | -4.0186 | -4.0186 |
| 5.475 | 0.39 | 400 | 2.8595 | -14.7993 | -14.4606 | 0.4462 | -0.3387 | -65.0815 | -64.5839 | -5.5881 | -5.5881 |
| 4.925 | 0.44 | 450 | 2.8461 | -14.9405 | -14.6310 | 0.4505 | -0.3095 | -65.6497 | -65.0547 | -5.7266 | -5.7266 |
| 4.0656 | 0.49 | 500 | 2.8676 | -14.8313 | -14.5335 | 0.4396 | -0.2979 | -65.3244 | -64.6909 | -5.3771 | -5.3771 |
| 4.3688 | 0.54 | 550 | 2.8408 | -14.7379 | -14.4086 | 0.4352 | -0.3293 | -64.9083 | -64.3793 | -5.5129 | -5.5129 |
| 2.3281 | 0.59 | 600 | 2.8091 | -14.4630 | -14.1427 | 0.4374 | -0.3202 | -64.0219 | -63.4629 | -5.0091 | -5.0091 |
| 4.2781 | 0.64 | 650 | 2.6868 | -14.5132 | -14.0888 | 0.4264 | -0.4244 | -63.8422 | -63.6305 | -4.5169 | -4.5170 |
| 4.1469 | 0.68 | 700 | 2.4108 | -17.3614 | -17.1379 | 0.4264 | -0.2235 | -74.0058 | -73.1244 | -3.4213 | -3.4211 |
| 2.2094 | 0.73 | 750 | 2.3138 | -17.0230 | -16.5801 | 0.4110 | -0.4430 | -72.1465 | -71.9965 | -4.4044 | -4.4043 |
| 1.5219 | 0.78 | 800 | 2.3857 | -19.1901 | -18.7328 | 0.4396 | -0.4573 | -79.3222 | -79.2200 | -4.0721 | -4.0720 |
| 3.2406 | 0.83 | 850 | 2.1160 | -21.0445 | -20.4125 | 0.3758 | -0.6320 | -84.9211 | -85.4013 | -4.1028 | -4.1026 |
| 1.8844 | 0.88 | 900 | 2.1362 | -22.7368 | -22.2138 | 0.4220 | -0.5229 | -90.9257 | -91.0423 | -4.4034 | -4.4033 |
| 2.7984 | 0.93 | 950 | 2.0654 | -22.4923 | -21.9278 | 0.4198 | -0.5645 | -89.9723 | -90.2274 | -4.4118 | -4.4116 |
| 2.7203 | 0.98 | 1000 | 2.0612 | -22.4821 | -21.9166 | 0.4198 | -0.5655 | -89.9348 | -90.1933 | -4.4171 | -4.4169 |
### Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
|