--- license: apache-2.0 library_name: peft tags: - generated_from_trainer base_model: mnoukhov/pythia410m-sft-tldr model-index: - name: pythia410m-dpo-tldr results: [] --- # pythia410m-dpo-tldr This model is a fine-tuned version of [mnoukhov/pythia410m-sft-tldr](https://huggingface.co/mnoukhov/pythia410m-sft-tldr) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.5395 - Rewards/chosen: -1.3883 - Rewards/rejected: -1.9858 - Rewards/accuracies: 0.7226 - Rewards/margins: 0.5975 - Logps/rejected: -98.0320 - Logps/chosen: -98.0320 - Logps/ref Rejected: -63.5119 - Logps/ref Chosen: -70.2656 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - num_epochs: 1.0 ### Training results | Training Loss | Epoch | Step | Logps/chosen | Logps/ref Chosen | Logps/ref Rejected | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected | |:-------------:|:-----:|:----:|:------------:|:----------------:|:------------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:| | 0.5961 | 0.2 | 291 | -93.0907 | -70.2656 | -63.5119 | -93.0907 | 0.5659 | 0.7036 | -1.1413 | 0.4667 | -1.6079 | | 0.5574 | 0.4 | 582 | 0.5405 | -1.6195 | -2.2373 | 0.7216 | 0.6178 | -102.6558 | -102.6558 | -63.5119 | -70.2656 | | 0.5418 | 0.6 | 873 | 0.5373 | -1.4908 | -2.1191 | 0.7226 | 0.6283 | -100.0813 | -100.0813 | -63.5119 | -70.2656 | | 0.5339 | 0.8 | 1164 | 0.5395 | -1.3883 | -1.9858 | 0.7226 | 0.5975 | -98.0320 | -98.0320 | -63.5119 | -70.2656 | ### Framework versions - PEFT 0.10.0 - Transformers 4.38.2 - Pytorch 2.1.2+cu121 - Datasets 2.17.0 - Tokenizers 0.15.2