metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e6_rate_05_beta_DPO
results: []
v1_1000_STEPS_1e6_rate_05_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1310
- Rewards/chosen: -2.3907
- Rewards/rejected: -3.3587
- Rewards/accuracies: 0.5319
- Rewards/margins: 0.9681
- Logps/rejected: -23.5970
- Logps/chosen: -20.0344
- Logits/rejected: -3.2860
- Logits/chosen: -3.2861
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7136 | 0.05 | 50 | 0.6682 | -0.1718 | -0.2901 | 0.5473 | 0.1184 | -17.4598 | -15.5966 | -3.3833 | -3.3834 |
0.8377 | 0.1 | 100 | 0.8534 | -1.2874 | -1.8482 | 0.5495 | 0.5608 | -20.5758 | -17.8278 | -3.3665 | -3.3666 |
1.5418 | 0.15 | 150 | 1.2106 | -3.7074 | -3.9590 | 0.5055 | 0.2516 | -24.7976 | -22.6679 | -3.3872 | -3.3874 |
0.9966 | 0.2 | 200 | 1.3074 | -2.7550 | -3.0485 | 0.5099 | 0.2935 | -22.9766 | -20.7630 | -3.3239 | -3.3240 |
1.631 | 0.24 | 250 | 1.1695 | -2.1801 | -2.7422 | 0.5231 | 0.5621 | -22.3639 | -19.6133 | -3.2748 | -3.2750 |
1.4651 | 0.29 | 300 | 1.2408 | -2.1404 | -2.6522 | 0.5033 | 0.5118 | -22.1839 | -19.5338 | -3.3806 | -3.3808 |
1.9294 | 0.34 | 350 | 1.2181 | -1.8900 | -2.3214 | 0.5121 | 0.4313 | -21.5223 | -19.0331 | -3.3884 | -3.3885 |
1.6417 | 0.39 | 400 | 1.1754 | -1.9580 | -2.4289 | 0.4967 | 0.4710 | -21.7374 | -19.1690 | -3.4056 | -3.4057 |
1.0114 | 0.44 | 450 | 1.2146 | -2.0096 | -2.4935 | 0.4879 | 0.4839 | -21.8665 | -19.2723 | -3.3460 | -3.3461 |
1.0581 | 0.49 | 500 | 1.2539 | -2.5636 | -3.1382 | 0.5077 | 0.5746 | -23.1559 | -20.3803 | -3.3437 | -3.3439 |
1.3239 | 0.54 | 550 | 1.1739 | -2.1012 | -2.8810 | 0.5253 | 0.7798 | -22.6415 | -19.4555 | -3.3313 | -3.3314 |
1.2819 | 0.59 | 600 | 1.1770 | -2.3179 | -3.1791 | 0.5407 | 0.8612 | -23.2377 | -19.8889 | -3.3037 | -3.3038 |
0.9194 | 0.64 | 650 | 1.1859 | -2.0739 | -2.9235 | 0.5407 | 0.8496 | -22.7266 | -19.4008 | -3.2953 | -3.2955 |
1.0744 | 0.68 | 700 | 1.1623 | -2.2911 | -3.1685 | 0.5187 | 0.8773 | -23.2165 | -19.8353 | -3.2851 | -3.2853 |
1.3268 | 0.73 | 750 | 1.1441 | -2.3481 | -3.2869 | 0.5231 | 0.9388 | -23.4534 | -19.9493 | -3.2891 | -3.2892 |
1.1064 | 0.78 | 800 | 1.1339 | -2.3526 | -3.3046 | 0.5275 | 0.9520 | -23.4888 | -19.9583 | -3.2881 | -3.2882 |
1.0456 | 0.83 | 850 | 1.1330 | -2.3878 | -3.3498 | 0.5275 | 0.9620 | -23.5791 | -20.0286 | -3.2864 | -3.2865 |
1.4001 | 0.88 | 900 | 1.1333 | -2.3931 | -3.3565 | 0.5275 | 0.9634 | -23.5926 | -20.0393 | -3.2860 | -3.2861 |
1.1629 | 0.93 | 950 | 1.1330 | -2.3904 | -3.3570 | 0.5275 | 0.9666 | -23.5936 | -20.0339 | -3.2860 | -3.2861 |
0.9777 | 0.98 | 1000 | 1.1310 | -2.3907 | -3.3587 | 0.5319 | 0.9681 | -23.5970 | -20.0344 | -3.2860 | -3.2861 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2