tsavage68's picture
End of training
6982f12 verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: v1_1000_STEPS_1e6_rate_05_beta_DPO
    results: []

v1_1000_STEPS_1e6_rate_05_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1310
  • Rewards/chosen: -2.3907
  • Rewards/rejected: -3.3587
  • Rewards/accuracies: 0.5319
  • Rewards/margins: 0.9681
  • Logps/rejected: -23.5970
  • Logps/chosen: -20.0344
  • Logits/rejected: -3.2860
  • Logits/chosen: -3.2861

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7136 0.05 50 0.6682 -0.1718 -0.2901 0.5473 0.1184 -17.4598 -15.5966 -3.3833 -3.3834
0.8377 0.1 100 0.8534 -1.2874 -1.8482 0.5495 0.5608 -20.5758 -17.8278 -3.3665 -3.3666
1.5418 0.15 150 1.2106 -3.7074 -3.9590 0.5055 0.2516 -24.7976 -22.6679 -3.3872 -3.3874
0.9966 0.2 200 1.3074 -2.7550 -3.0485 0.5099 0.2935 -22.9766 -20.7630 -3.3239 -3.3240
1.631 0.24 250 1.1695 -2.1801 -2.7422 0.5231 0.5621 -22.3639 -19.6133 -3.2748 -3.2750
1.4651 0.29 300 1.2408 -2.1404 -2.6522 0.5033 0.5118 -22.1839 -19.5338 -3.3806 -3.3808
1.9294 0.34 350 1.2181 -1.8900 -2.3214 0.5121 0.4313 -21.5223 -19.0331 -3.3884 -3.3885
1.6417 0.39 400 1.1754 -1.9580 -2.4289 0.4967 0.4710 -21.7374 -19.1690 -3.4056 -3.4057
1.0114 0.44 450 1.2146 -2.0096 -2.4935 0.4879 0.4839 -21.8665 -19.2723 -3.3460 -3.3461
1.0581 0.49 500 1.2539 -2.5636 -3.1382 0.5077 0.5746 -23.1559 -20.3803 -3.3437 -3.3439
1.3239 0.54 550 1.1739 -2.1012 -2.8810 0.5253 0.7798 -22.6415 -19.4555 -3.3313 -3.3314
1.2819 0.59 600 1.1770 -2.3179 -3.1791 0.5407 0.8612 -23.2377 -19.8889 -3.3037 -3.3038
0.9194 0.64 650 1.1859 -2.0739 -2.9235 0.5407 0.8496 -22.7266 -19.4008 -3.2953 -3.2955
1.0744 0.68 700 1.1623 -2.2911 -3.1685 0.5187 0.8773 -23.2165 -19.8353 -3.2851 -3.2853
1.3268 0.73 750 1.1441 -2.3481 -3.2869 0.5231 0.9388 -23.4534 -19.9493 -3.2891 -3.2892
1.1064 0.78 800 1.1339 -2.3526 -3.3046 0.5275 0.9520 -23.4888 -19.9583 -3.2881 -3.2882
1.0456 0.83 850 1.1330 -2.3878 -3.3498 0.5275 0.9620 -23.5791 -20.0286 -3.2864 -3.2865
1.4001 0.88 900 1.1333 -2.3931 -3.3565 0.5275 0.9634 -23.5926 -20.0393 -3.2860 -3.2861
1.1629 0.93 950 1.1330 -2.3904 -3.3570 0.5275 0.9666 -23.5936 -20.0339 -3.2860 -3.2861
0.9777 0.98 1000 1.1310 -2.3907 -3.3587 0.5319 0.9681 -23.5970 -20.0344 -3.2860 -3.2861

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2