File size: 5,766 Bytes

b52c128

---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: v1_1000_STEPS_1e7_rate_01_beta_DPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# v1_1000_STEPS_1e7_rate_01_beta_DPO

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6730
- Rewards/chosen: -0.0669
- Rewards/rejected: -0.1113
- Rewards/accuracies: 0.5890
- Rewards/margins: 0.0445
- Logps/rejected: -17.9930
- Logps/chosen: -15.9218
- Logits/rejected: -3.3417
- Logits/chosen: -3.3418

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6944        | 0.05  | 50   | 0.6930          | -0.0001        | -0.0004          | 0.4791             | 0.0003          | -16.8836       | -15.2543     | -3.3540         | -3.3541       |
| 0.6896        | 0.1   | 100  | 0.6907          | -0.0026        | -0.0076          | 0.5670             | 0.0050          | -16.9551       | -15.2788     | -3.3527         | -3.3528       |
| 0.6879        | 0.15  | 150  | 0.6878          | -0.0076        | -0.0188          | 0.5736             | 0.0112          | -17.0680       | -15.3294     | -3.3516         | -3.3517       |
| 0.6836        | 0.2   | 200  | 0.6849          | -0.0190        | -0.0363          | 0.5670             | 0.0173          | -17.2422       | -15.4426     | -3.3479         | -3.3480       |
| 0.6804        | 0.24  | 250  | 0.6825          | -0.0285        | -0.0510          | 0.5868             | 0.0226          | -17.3899       | -15.5377     | -3.3456         | -3.3457       |
| 0.6753        | 0.29  | 300  | 0.6802          | -0.0411        | -0.0689          | 0.5890             | 0.0277          | -17.5681       | -15.6645     | -3.3452         | -3.3453       |
| 0.6908        | 0.34  | 350  | 0.6788          | -0.0382        | -0.0690          | 0.5956             | 0.0307          | -17.5691       | -15.6352     | -3.3447         | -3.3448       |
| 0.6881        | 0.39  | 400  | 0.6773          | -0.0391        | -0.0735          | 0.5934             | 0.0344          | -17.6147       | -15.6439     | -3.3446         | -3.3447       |
| 0.6519        | 0.44  | 450  | 0.6757          | -0.0500        | -0.0881          | 0.5912             | 0.0381          | -17.7606       | -15.7528     | -3.3434         | -3.3435       |
| 0.6871        | 0.49  | 500  | 0.6751          | -0.0504        | -0.0897          | 0.5978             | 0.0394          | -17.7768       | -15.7565     | -3.3425         | -3.3426       |
| 0.6495        | 0.54  | 550  | 0.6737          | -0.0598        | -0.1025          | 0.5934             | 0.0427          | -17.9043       | -15.8506     | -3.3424         | -3.3425       |
| 0.6756        | 0.59  | 600  | 0.6738          | -0.0611        | -0.1038          | 0.5912             | 0.0427          | -17.9179       | -15.8641     | -3.3420         | -3.3421       |
| 0.6584        | 0.64  | 650  | 0.6735          | -0.0625        | -0.1058          | 0.5890             | 0.0434          | -17.9379       | -15.8778     | -3.3422         | -3.3423       |
| 0.6747        | 0.68  | 700  | 0.6734          | -0.0652        | -0.1089          | 0.5824             | 0.0437          | -17.9690       | -15.9052     | -3.3417         | -3.3418       |
| 0.6735        | 0.73  | 750  | 0.6733          | -0.0662        | -0.1102          | 0.5670             | 0.0440          | -17.9819       | -15.9150     | -3.3417         | -3.3418       |
| 0.6573        | 0.78  | 800  | 0.6732          | -0.0671        | -0.1112          | 0.5868             | 0.0442          | -17.9917       | -15.9236     | -3.3417         | -3.3418       |
| 0.6768        | 0.83  | 850  | 0.6732          | -0.0671        | -0.1112          | 0.5934             | 0.0441          | -17.9912       | -15.9238     | -3.3417         | -3.3418       |
| 0.6745        | 0.88  | 900  | 0.6733          | -0.0671        | -0.1110          | 0.5780             | 0.0439          | -17.9897       | -15.9243     | -3.3416         | -3.3418       |
| 0.6751        | 0.93  | 950  | 0.6730          | -0.0668        | -0.1114          | 0.5868             | 0.0446          | -17.9934       | -15.9211     | -3.3417         | -3.3418       |
| 0.6645        | 0.98  | 1000 | 0.6730          | -0.0669        | -0.1113          | 0.5890             | 0.0445          | -17.9930       | -15.9218     | -3.3417         | -3.3418       |


### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2