tsavage68's picture
End of training
b52c128 verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: v1_1000_STEPS_1e7_rate_01_beta_DPO
    results: []

v1_1000_STEPS_1e7_rate_01_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6730
  • Rewards/chosen: -0.0669
  • Rewards/rejected: -0.1113
  • Rewards/accuracies: 0.5890
  • Rewards/margins: 0.0445
  • Logps/rejected: -17.9930
  • Logps/chosen: -15.9218
  • Logits/rejected: -3.3417
  • Logits/chosen: -3.3418

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6944 0.05 50 0.6930 -0.0001 -0.0004 0.4791 0.0003 -16.8836 -15.2543 -3.3540 -3.3541
0.6896 0.1 100 0.6907 -0.0026 -0.0076 0.5670 0.0050 -16.9551 -15.2788 -3.3527 -3.3528
0.6879 0.15 150 0.6878 -0.0076 -0.0188 0.5736 0.0112 -17.0680 -15.3294 -3.3516 -3.3517
0.6836 0.2 200 0.6849 -0.0190 -0.0363 0.5670 0.0173 -17.2422 -15.4426 -3.3479 -3.3480
0.6804 0.24 250 0.6825 -0.0285 -0.0510 0.5868 0.0226 -17.3899 -15.5377 -3.3456 -3.3457
0.6753 0.29 300 0.6802 -0.0411 -0.0689 0.5890 0.0277 -17.5681 -15.6645 -3.3452 -3.3453
0.6908 0.34 350 0.6788 -0.0382 -0.0690 0.5956 0.0307 -17.5691 -15.6352 -3.3447 -3.3448
0.6881 0.39 400 0.6773 -0.0391 -0.0735 0.5934 0.0344 -17.6147 -15.6439 -3.3446 -3.3447
0.6519 0.44 450 0.6757 -0.0500 -0.0881 0.5912 0.0381 -17.7606 -15.7528 -3.3434 -3.3435
0.6871 0.49 500 0.6751 -0.0504 -0.0897 0.5978 0.0394 -17.7768 -15.7565 -3.3425 -3.3426
0.6495 0.54 550 0.6737 -0.0598 -0.1025 0.5934 0.0427 -17.9043 -15.8506 -3.3424 -3.3425
0.6756 0.59 600 0.6738 -0.0611 -0.1038 0.5912 0.0427 -17.9179 -15.8641 -3.3420 -3.3421
0.6584 0.64 650 0.6735 -0.0625 -0.1058 0.5890 0.0434 -17.9379 -15.8778 -3.3422 -3.3423
0.6747 0.68 700 0.6734 -0.0652 -0.1089 0.5824 0.0437 -17.9690 -15.9052 -3.3417 -3.3418
0.6735 0.73 750 0.6733 -0.0662 -0.1102 0.5670 0.0440 -17.9819 -15.9150 -3.3417 -3.3418
0.6573 0.78 800 0.6732 -0.0671 -0.1112 0.5868 0.0442 -17.9917 -15.9236 -3.3417 -3.3418
0.6768 0.83 850 0.6732 -0.0671 -0.1112 0.5934 0.0441 -17.9912 -15.9238 -3.3417 -3.3418
0.6745 0.88 900 0.6733 -0.0671 -0.1110 0.5780 0.0439 -17.9897 -15.9243 -3.3416 -3.3418
0.6751 0.93 950 0.6730 -0.0668 -0.1114 0.5868 0.0446 -17.9934 -15.9211 -3.3417 -3.3418
0.6645 0.98 1000 0.6730 -0.0669 -0.1113 0.5890 0.0445 -17.9930 -15.9218 -3.3417 -3.3418

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2