--- license: apache-2.0 base_model: mosaicml/mpt-7b-instruct tags: - trl - dpo - generated_from_trainer model-index: - name: mpt_1000_STEPS_1e7_rate_01_beta_DPO results: [] --- # mpt_1000_STEPS_1e7_rate_01_beta_DPO This model is a fine-tuned version of [mosaicml/mpt-7b-instruct](https://huggingface.co/mosaicml/mpt-7b-instruct) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6923 - Rewards/chosen: -0.0044 - Rewards/rejected: -0.0063 - Rewards/accuracies: 0.5187 - Rewards/margins: 0.0018 - Logps/rejected: -21.6204 - Logps/chosen: -20.8367 - Logits/rejected: 14.2289 - Logits/chosen: 14.2314 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6934 | 0.05 | 50 | 0.6935 | -0.0003 | 0.0002 | 0.5033 | -0.0005 | -21.5550 | -20.7952 | 14.2545 | 14.2571 | | 0.6945 | 0.1 | 100 | 0.6936 | 0.0001 | 0.0009 | 0.4857 | -0.0009 | -21.5484 | -20.7916 | 14.2577 | 14.2603 | | 0.6957 | 0.15 | 150 | 0.6933 | -0.0003 | -0.0000 | 0.4484 | -0.0002 | -21.5578 | -20.7949 | 14.2444 | 14.2470 | | 0.6921 | 0.2 | 200 | 0.6933 | 0.0035 | 0.0038 | 0.5011 | -0.0002 | -21.5199 | -20.7570 | 14.2457 | 14.2483 | | 0.6948 | 0.24 | 250 | 0.6935 | -0.0006 | -0.0001 | 0.4923 | -0.0005 | -21.5583 | -20.7983 | 14.2481 | 14.2507 | | 0.6924 | 0.29 | 300 | 0.6922 | 0.0008 | -0.0011 | 0.5407 | 0.0019 | -21.5685 | -20.7839 | 14.2444 | 14.2470 | | 0.6916 | 0.34 | 350 | 0.6930 | -0.0007 | -0.0012 | 0.5055 | 0.0005 | -21.5698 | -20.7993 | 14.2356 | 14.2383 | | 0.6904 | 0.39 | 400 | 0.6928 | -0.0043 | -0.0051 | 0.5077 | 0.0009 | -21.6086 | -20.8348 | 14.2367 | 14.2393 | | 0.6904 | 0.44 | 450 | 0.6923 | -0.0042 | -0.0060 | 0.5582 | 0.0018 | -21.6177 | -20.8341 | 14.2352 | 14.2378 | | 0.6904 | 0.49 | 500 | 0.6929 | -0.0049 | -0.0057 | 0.5297 | 0.0008 | -21.6144 | -20.8416 | 14.2367 | 14.2393 | | 0.6893 | 0.54 | 550 | 0.6924 | -0.0032 | -0.0048 | 0.5385 | 0.0016 | -21.6055 | -20.8238 | 14.2319 | 14.2345 | | 0.6903 | 0.59 | 600 | 0.6923 | -0.0045 | -0.0063 | 0.5055 | 0.0018 | -21.6203 | -20.8373 | 14.2321 | 14.2347 | | 0.6907 | 0.64 | 650 | 0.6923 | -0.0038 | -0.0057 | 0.5121 | 0.0019 | -21.6141 | -20.8299 | 14.2239 | 14.2265 | | 0.6913 | 0.68 | 700 | 0.6926 | -0.0045 | -0.0058 | 0.5231 | 0.0014 | -21.6159 | -20.8372 | 14.2301 | 14.2327 | | 0.6909 | 0.73 | 750 | 0.6917 | -0.0036 | -0.0067 | 0.5451 | 0.0031 | -21.6244 | -20.8281 | 14.2134 | 14.2160 | | 0.6876 | 0.78 | 800 | 0.6928 | -0.0046 | -0.0056 | 0.5187 | 0.0009 | -21.6130 | -20.8387 | 14.2215 | 14.2241 | | 0.6985 | 0.83 | 850 | 0.6920 | -0.0040 | -0.0065 | 0.5560 | 0.0025 | -21.6226 | -20.8319 | 14.2307 | 14.2334 | | 0.6912 | 0.88 | 900 | 0.6925 | -0.0036 | -0.0051 | 0.5209 | 0.0014 | -21.6082 | -20.8285 | 14.2279 | 14.2304 | | 0.6931 | 0.93 | 950 | 0.6923 | -0.0044 | -0.0063 | 0.5187 | 0.0018 | -21.6204 | -20.8367 | 14.2289 | 14.2314 | | 0.6914 | 0.98 | 1000 | 0.6923 | -0.0044 | -0.0063 | 0.5187 | 0.0018 | -21.6204 | -20.8367 | 14.2289 | 14.2314 | ### Framework versions - Transformers 4.39.1 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2