---
base_model: rasyosef/phi-1_5-sft
library_name: peft
license: mit
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: phi-1_5-dpo
  results: []
datasets:
- HuggingFaceH4/ultrafeedback_binarized
- argilla/distilabel-intel-orca-dpo-pairs
- jondurbin/py-dpo-v0.1
- argilla/distilabel-math-preference-dpo
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi-1_5-dpo

This model is a fine-tuned version of [rasyosef/phi-1_5-sft](https://huggingface.co/rasyosef/phi-1_5-sft) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5013
- Rewards/chosen: -1.0250
- Rewards/rejected: -2.3893
- Rewards/accuracies: 0.7283
- Rewards/margins: 1.3643
- Logps/rejected: -162.0916
- Logps/chosen: -128.1033
- Logits/rejected: 5.3082
- Logits/chosen: 5.1890

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 300
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6899        | 0.1241 | 138  | 0.6769          | -0.0153        | -0.0504          | 0.625              | 0.0351          | -138.7025      | -118.0066    | 4.5710          | 4.4532        |
| 0.6309        | 0.2482 | 276  | 0.6035          | -0.2012        | -0.5586          | 0.7120             | 0.3575          | -143.7850      | -119.8655    | 4.5167          | 4.3940        |
| 0.5756        | 0.3723 | 414  | 0.5669          | -0.3693        | -0.9842          | 0.7174             | 0.6149          | -148.0405      | -121.5467    | 4.6242          | 4.5060        |
| 0.5715        | 0.4964 | 552  | 0.5446          | -0.4109        | -1.1855          | 0.7283             | 0.7745          | -150.0534      | -121.9633    | 4.7324          | 4.6143        |
| 0.5449        | 0.6205 | 690  | 0.5331          | -0.4666        | -1.3090          | 0.7446             | 0.8424          | -151.2884      | -122.5196    | 4.8229          | 4.7080        |
| 0.5536        | 0.7446 | 828  | 0.5136          | -0.4885        | -1.3825          | 0.7446             | 0.8940          | -152.0234      | -122.7389    | 4.8867          | 4.7737        |
| 0.5253        | 0.8687 | 966  | 0.5057          | -0.5613        | -1.5446          | 0.7554             | 0.9832          | -153.6442      | -123.4672    | 4.9287          | 4.8080        |
| 0.5249        | 0.9928 | 1104 | 0.5054          | -0.5101        | -1.4656          | 0.75               | 0.9555          | -152.8544      | -122.9549    | 4.8704          | 4.7521        |
| 0.4631        | 1.1169 | 1242 | 0.5067          | -0.6889        | -1.7678          | 0.75               | 1.0789          | -155.8768      | -124.7426    | 4.8470          | 4.7276        |
| 0.4524        | 1.2410 | 1380 | 0.5006          | -0.7467        | -1.9049          | 0.7446             | 1.1582          | -157.2474      | -125.3205    | 4.9447          | 4.8239        |
| 0.424         | 1.3651 | 1518 | 0.5036          | -0.7638        | -2.0144          | 0.7337             | 1.2505          | -158.3425      | -125.4923    | 4.9235          | 4.8002        |
| 0.4428        | 1.4892 | 1656 | 0.5004          | -0.7790        | -2.0132          | 0.7446             | 1.2342          | -158.3307      | -125.6437    | 4.9576          | 4.8375        |
| 0.4424        | 1.6133 | 1794 | 0.4944          | -0.8220        | -2.0517          | 0.7391             | 1.2297          | -158.7152      | -126.0739    | 4.9736          | 4.8553        |
| 0.4358        | 1.7374 | 1932 | 0.5022          | -0.8091        | -1.9993          | 0.7228             | 1.1902          | -158.1918      | -125.9447    | 5.0894          | 4.9702        |
| 0.4426        | 1.8615 | 2070 | 0.4992          | -0.8254        | -2.0308          | 0.7228             | 1.2054          | -158.5065      | -126.1077    | 5.0943          | 4.9780        |
| 0.4226        | 1.9856 | 2208 | 0.4971          | -0.8701        | -2.1434          | 0.7283             | 1.2733          | -159.6329      | -126.5553    | 5.1222          | 5.0011        |
| 0.3684        | 2.1097 | 2346 | 0.5032          | -0.9201        | -2.2281          | 0.7228             | 1.3081          | -160.4799      | -127.0545    | 5.2209          | 5.1031        |
| 0.3695        | 2.2338 | 2484 | 0.5022          | -0.9332        | -2.2651          | 0.7228             | 1.3319          | -160.8495      | -127.1860    | 5.2170          | 5.0977        |
| 0.3693        | 2.3579 | 2622 | 0.5022          | -0.9418        | -2.2839          | 0.7283             | 1.3421          | -161.0379      | -127.2717    | 5.2390          | 5.1169        |
| 0.3659        | 2.4820 | 2760 | 0.5037          | -0.9820        | -2.3392          | 0.7228             | 1.3572          | -161.5908      | -127.6742    | 5.2392          | 5.1148        |
| 0.3557        | 2.6061 | 2898 | 0.5031          | -1.0001        | -2.3531          | 0.7228             | 1.3529          | -161.7294      | -127.8552    | 5.2704          | 5.1488        |
| 0.3491        | 2.7302 | 3036 | 0.5053          | -1.0242        | -2.3803          | 0.7228             | 1.3562          | -162.0017      | -128.0954    | 5.2880          | 5.1693        |
| 0.3512        | 2.8543 | 3174 | 0.5036          | -1.0265        | -2.3833          | 0.7174             | 1.3568          | -162.0320      | -128.1190    | 5.2965          | 5.1768        |
| 0.3458        | 2.9784 | 3312 | 0.5013          | -1.0250        | -2.3893          | 0.7283             | 1.3643          | -162.0916      | -128.1033    | 5.3082          | 5.1890        |


### Framework versions

- PEFT 0.11.1
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1