--- license: apache-2.0 library_name: peft tags: - alignment-handbook - generated_from_trainer - trl - dpo - generated_from_trainer datasets: - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset base_model: mistralai/Mistral-7B-v0.1 model-index: - name: zephyr-7b-dpo-qlora results: [] --- # zephyr-7b-dpo-qlora This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set: - Loss: 0.6707 - Rewards/chosen: -0.2860 - Rewards/rejected: -0.3548 - Rewards/accuracies: 0.5983 - Rewards/margins: 0.0687 - Logps/rejected: -367.6676 - Logps/chosen: -351.0971 - Logits/rejected: -2.5801 - Logits/chosen: -2.5726 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6932 | 0.08 | 100 | 0.6930 | -0.0030 | -0.0033 | 0.5220 | 0.0003 | -332.5208 | -322.7949 | -2.4978 | -2.4908 | | 0.6921 | 0.16 | 200 | 0.6927 | -0.0232 | -0.0243 | 0.5183 | 0.0011 | -334.6197 | -324.8167 | -2.4970 | -2.4900 | | 0.6913 | 0.24 | 300 | 0.6919 | -0.0414 | -0.0441 | 0.5340 | 0.0027 | -336.6059 | -326.6393 | -2.4967 | -2.4895 | | 0.6893 | 0.32 | 400 | 0.6891 | -0.0791 | -0.0883 | 0.5547 | 0.0093 | -341.0244 | -330.4017 | -2.5023 | -2.4953 | | 0.6724 | 0.4 | 500 | 0.6844 | -0.2018 | -0.2253 | 0.5530 | 0.0235 | -354.7256 | -342.6785 | -2.5100 | -2.5029 | | 0.6849 | 0.48 | 600 | 0.6805 | -0.3366 | -0.3770 | 0.5597 | 0.0404 | -369.8958 | -356.1591 | -2.5412 | -2.5347 | | 0.6503 | 0.56 | 700 | 0.6774 | -0.4376 | -0.4919 | 0.5630 | 0.0543 | -381.3843 | -366.2523 | -2.5492 | -2.5431 | | 0.6841 | 0.64 | 800 | 0.6735 | -0.3183 | -0.3788 | 0.5913 | 0.0605 | -370.0676 | -354.3206 | -2.5662 | -2.5592 | | 0.6773 | 0.72 | 900 | 0.6724 | -0.3986 | -0.4678 | 0.5887 | 0.0692 | -378.9693 | -362.3546 | -2.5774 | -2.5706 | | 0.657 | 0.8 | 1000 | 0.6711 | -0.2774 | -0.3440 | 0.5997 | 0.0666 | -366.5909 | -350.2372 | -2.5784 | -2.5708 | | 0.6577 | 0.88 | 1100 | 0.6706 | -0.2934 | -0.3628 | 0.5993 | 0.0693 | -368.4680 | -351.8376 | -2.5805 | -2.5729 | | 0.6444 | 0.96 | 1200 | 0.6708 | -0.2860 | -0.3547 | 0.5993 | 0.0687 | -367.6592 | -351.0949 | -2.5801 | -2.5725 | ### Framework versions - PEFT 0.7.1 - Transformers 4.36.2 - Pytorch 2.1.2 - Datasets 2.14.6 - Tokenizers 0.15.0