yihang7's picture
Model save
eee971a verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
  - generated_from_trainer
model-index:
  - name: Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe
    results: []

Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0040
  • Rewards/chosen: 0.1378
  • Rewards/rejected: -29.0317
  • Rewards/accuracies: 0.9983
  • Rewards/margins: 29.1695
  • Logps/rejected: -714.5497
  • Logps/chosen: -254.4278
  • Logits/rejected: -3.3257
  • Logits/chosen: -3.4722

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1608 0.03 100 0.1654 1.2374 -2.6089 0.9571 3.8463 -450.3222 -243.4314 -3.2204 -3.2045
0.1349 0.07 200 0.0961 0.9406 -6.3451 0.9756 7.2857 -487.6837 -246.3994 -3.1898 -3.2216
0.1065 0.1 300 0.1015 -0.2203 -9.2710 0.9840 9.0507 -516.9434 -258.0089 -3.1999 -3.2283
0.0876 0.14 400 0.0597 -1.4412 -13.6992 0.9865 12.2580 -561.2250 -270.2174 -3.2066 -3.2753
0.304 0.17 500 0.0874 -0.2677 -17.2497 0.9891 16.9821 -596.7302 -258.4822 -3.2093 -3.2601
0.1206 0.2 600 0.0686 -0.4252 -15.6514 0.9891 15.2262 -580.7473 -260.0578 -3.1689 -3.2024
0.0176 0.24 700 0.0630 -0.7082 -17.5291 0.9933 16.8209 -599.5242 -262.8876 -3.2305 -3.2958
0.0461 0.27 800 0.0341 -1.2542 -21.2558 0.9933 20.0016 -636.7914 -268.3477 -3.3936 -3.5158
0.0185 0.31 900 0.0291 0.3781 -17.2475 0.9966 17.6256 -596.7079 -252.0242 -3.3745 -3.4941
0.0219 0.34 1000 0.0248 -0.1014 -19.6177 0.9958 19.5163 -620.4097 -256.8191 -3.3236 -3.4703
0.0193 0.37 1100 0.0476 0.2441 -22.8685 0.9949 23.1126 -652.9178 -253.3648 -3.3700 -3.5127
0.0153 0.41 1200 0.0344 0.2337 -21.0722 0.9958 21.3059 -634.9553 -253.4690 -3.3281 -3.4433
0.1011 0.44 1300 0.0320 0.3865 -19.5099 0.9941 19.8964 -619.3322 -251.9406 -3.2086 -3.2943
0.0085 0.48 1400 0.0164 -0.3604 -24.6053 0.9958 24.2449 -670.2856 -259.4097 -3.3688 -3.5055
0.0057 0.51 1500 0.0115 -0.8584 -33.7853 0.9966 32.9269 -762.0861 -264.3898 -3.2986 -3.4455
0.0082 0.54 1600 0.0525 -0.3661 -22.4426 0.9975 22.0765 -648.6592 -259.4668 -3.3372 -3.4816
0.0128 0.58 1700 0.0514 -0.4253 -24.3063 0.9958 23.8810 -667.2958 -260.0584 -3.3102 -3.4488
0.0018 0.61 1800 0.0356 -0.3563 -24.1492 0.9966 23.7929 -665.7247 -259.3687 -3.2894 -3.4159
0.0105 0.65 1900 0.0381 -0.9566 -33.8957 0.9958 32.9391 -763.1902 -265.3718 -3.3840 -3.5348
0.006 0.68 2000 0.0072 -0.1403 -26.2483 0.9975 26.1080 -686.7160 -257.2083 -3.3371 -3.4805
0.0026 0.71 2100 0.0102 -0.1870 -29.0470 0.9966 28.8600 -714.7033 -257.6760 -3.3557 -3.4974
0.0038 0.75 2200 0.0078 -0.4803 -29.8773 0.9966 29.3970 -723.0064 -260.6087 -3.3551 -3.5046
0.0011 0.78 2300 0.0075 -0.4771 -28.4348 0.9966 27.9577 -708.5814 -260.5770 -3.3459 -3.4948
0.0033 0.82 2400 0.0047 -0.1998 -28.0030 0.9983 27.8032 -704.2631 -257.8039 -3.3489 -3.4950
0.0051 0.85 2500 0.0048 -0.2771 -29.2358 0.9992 28.9587 -716.5906 -258.5765 -3.3025 -3.4428
0.0074 0.88 2600 0.0044 -0.2089 -29.6486 0.9975 29.4396 -720.7189 -257.8950 -3.3320 -3.4805
0.0032 0.92 2700 0.0041 -0.1675 -30.1791 0.9975 30.0116 -726.0242 -257.4810 -3.3308 -3.4822
0.0023 0.95 2800 0.0038 0.0604 -29.3907 0.9983 29.4511 -718.1400 -255.2013 -3.3267 -3.4751
0.003 0.99 2900 0.0040 0.1446 -28.9793 0.9983 29.1239 -714.0264 -254.3596 -3.3257 -3.4723

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1