tsavage68's picture
End of training
b36409d verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: v1_1000_STEPS_1e5_rate_03_beta_DPO
    results: []

v1_1000_STEPS_1e5_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0612
  • Rewards/chosen: -22.4821
  • Rewards/rejected: -21.9166
  • Rewards/accuracies: 0.4198
  • Rewards/margins: -0.5655
  • Logps/rejected: -89.9348
  • Logps/chosen: -90.1933
  • Logits/rejected: -4.4171
  • Logits/chosen: -4.4169

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.025 0.05 50 2.0989 -9.2701 -9.3262 0.4418 0.0561 -47.9670 -46.1535 -4.0702 -4.0700
3.1266 0.1 100 3.2379 -16.6921 -16.6056 0.4637 -0.0864 -72.2316 -70.8932 -3.1523 -3.1523
2.9672 0.15 150 2.9589 -15.0108 -14.8189 0.4571 -0.1919 -66.2757 -65.2890 -4.5807 -4.5807
3.7281 0.2 200 2.9926 -15.2425 -14.9338 0.4462 -0.3087 -66.6590 -66.0614 -4.9577 -4.9577
2.825 0.24 250 2.9153 -14.7019 -14.3934 0.4505 -0.3085 -64.8577 -64.2594 -5.0246 -5.0246
3.9813 0.29 300 2.9308 -14.8129 -14.5166 0.4352 -0.2962 -65.2682 -64.6292 -4.5446 -4.5446
3.9125 0.34 350 2.9798 -15.2390 -14.9581 0.4418 -0.2809 -66.7398 -66.0496 -4.0186 -4.0186
5.475 0.39 400 2.8595 -14.7993 -14.4606 0.4462 -0.3387 -65.0815 -64.5839 -5.5881 -5.5881
4.925 0.44 450 2.8461 -14.9405 -14.6310 0.4505 -0.3095 -65.6497 -65.0547 -5.7266 -5.7266
4.0656 0.49 500 2.8676 -14.8313 -14.5335 0.4396 -0.2979 -65.3244 -64.6909 -5.3771 -5.3771
4.3688 0.54 550 2.8408 -14.7379 -14.4086 0.4352 -0.3293 -64.9083 -64.3793 -5.5129 -5.5129
2.3281 0.59 600 2.8091 -14.4630 -14.1427 0.4374 -0.3202 -64.0219 -63.4629 -5.0091 -5.0091
4.2781 0.64 650 2.6868 -14.5132 -14.0888 0.4264 -0.4244 -63.8422 -63.6305 -4.5169 -4.5170
4.1469 0.68 700 2.4108 -17.3614 -17.1379 0.4264 -0.2235 -74.0058 -73.1244 -3.4213 -3.4211
2.2094 0.73 750 2.3138 -17.0230 -16.5801 0.4110 -0.4430 -72.1465 -71.9965 -4.4044 -4.4043
1.5219 0.78 800 2.3857 -19.1901 -18.7328 0.4396 -0.4573 -79.3222 -79.2200 -4.0721 -4.0720
3.2406 0.83 850 2.1160 -21.0445 -20.4125 0.3758 -0.6320 -84.9211 -85.4013 -4.1028 -4.1026
1.8844 0.88 900 2.1362 -22.7368 -22.2138 0.4220 -0.5229 -90.9257 -91.0423 -4.4034 -4.4033
2.7984 0.93 950 2.0654 -22.4923 -21.9278 0.4198 -0.5645 -89.9723 -90.2274 -4.4118 -4.4116
2.7203 0.98 1000 2.0612 -22.4821 -21.9166 0.4198 -0.5655 -89.9348 -90.1933 -4.4171 -4.4169

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2