File size: 5,769 Bytes

b36409d

---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e5_rate_03_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# v1_1000_STEPS_1e5_rate_03_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0612
- Rewards/chosen: -22.4821
- Rewards/rejected: -21.9166
- Rewards/accuracies: 0.4198
- Rewards/margins: -0.5655
- Logps/rejected: -89.9348
- Logps/chosen: -90.1933
- Logits/rejected: -4.4171
- Logits/chosen: -4.4169

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 1.025         | 0.05  | 50   | 2.0989          | -9.2701        | -9.3262          | 0.4418             | 0.0561          | -47.9670       | -46.1535     | -4.0702         | -4.0700       |
| 3.1266        | 0.1   | 100  | 3.2379          | -16.6921       | -16.6056         | 0.4637             | -0.0864         | -72.2316       | -70.8932     | -3.1523         | -3.1523       |
| 2.9672        | 0.15  | 150  | 2.9589          | -15.0108       | -14.8189         | 0.4571             | -0.1919         | -66.2757       | -65.2890     | -4.5807         | -4.5807       |
| 3.7281        | 0.2   | 200  | 2.9926          | -15.2425       | -14.9338         | 0.4462             | -0.3087         | -66.6590       | -66.0614     | -4.9577         | -4.9577       |
| 2.825         | 0.24  | 250  | 2.9153          | -14.7019       | -14.3934         | 0.4505             | -0.3085         | -64.8577       | -64.2594     | -5.0246         | -5.0246       |
| 3.9813        | 0.29  | 300  | 2.9308          | -14.8129       | -14.5166         | 0.4352             | -0.2962         | -65.2682       | -64.6292     | -4.5446         | -4.5446       |
| 3.9125        | 0.34  | 350  | 2.9798          | -15.2390       | -14.9581         | 0.4418             | -0.2809         | -66.7398       | -66.0496     | -4.0186         | -4.0186       |
| 5.475         | 0.39  | 400  | 2.8595          | -14.7993       | -14.4606         | 0.4462             | -0.3387         | -65.0815       | -64.5839     | -5.5881         | -5.5881       |
| 4.925         | 0.44  | 450  | 2.8461          | -14.9405       | -14.6310         | 0.4505             | -0.3095         | -65.6497       | -65.0547     | -5.7266         | -5.7266       |
| 4.0656        | 0.49  | 500  | 2.8676          | -14.8313       | -14.5335         | 0.4396             | -0.2979         | -65.3244       | -64.6909     | -5.3771         | -5.3771       |
| 4.3688        | 0.54  | 550  | 2.8408          | -14.7379       | -14.4086         | 0.4352             | -0.3293         | -64.9083       | -64.3793     | -5.5129         | -5.5129       |
| 2.3281        | 0.59  | 600  | 2.8091          | -14.4630       | -14.1427         | 0.4374             | -0.3202         | -64.0219       | -63.4629     | -5.0091         | -5.0091       |
| 4.2781        | 0.64  | 650  | 2.6868          | -14.5132       | -14.0888         | 0.4264             | -0.4244         | -63.8422       | -63.6305     | -4.5169         | -4.5170       |
| 4.1469        | 0.68  | 700  | 2.4108          | -17.3614       | -17.1379         | 0.4264             | -0.2235         | -74.0058       | -73.1244     | -3.4213         | -3.4211       |
| 2.2094        | 0.73  | 750  | 2.3138          | -17.0230       | -16.5801         | 0.4110             | -0.4430         | -72.1465       | -71.9965     | -4.4044         | -4.4043       |
| 1.5219        | 0.78  | 800  | 2.3857          | -19.1901       | -18.7328         | 0.4396             | -0.4573         | -79.3222       | -79.2200     | -4.0721         | -4.0720       |
| 3.2406        | 0.83  | 850  | 2.1160          | -21.0445       | -20.4125         | 0.3758             | -0.6320         | -84.9211       | -85.4013     | -4.1028         | -4.1026       |
| 1.8844        | 0.88  | 900  | 2.1362          | -22.7368       | -22.2138         | 0.4220             | -0.5229         | -90.9257       | -91.0423     | -4.4034         | -4.4033       |
| 2.7984        | 0.93  | 950  | 2.0654          | -22.4923       | -21.9278         | 0.4198             | -0.5645         | -89.9723       | -90.2274     | -4.4118         | -4.4116       |
| 2.7203        | 0.98  | 1000 | 2.0612          | -22.4821       | -21.9166         | 0.4198             | -0.5655         | -89.9348       | -90.1933     | -4.4171         | -4.4169       |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2