File size: 7,559 Bytes
eee971a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.1
tags:
- generated_from_trainer
model-index:
- name: Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Mistral-7B-Instruct-v0.1-dpo-full-1-epoch-hydrox-safe

This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0040
- Rewards/chosen: 0.1378
- Rewards/rejected: -29.0317
- Rewards/accuracies: 0.9983
- Rewards/margins: 29.1695
- Logps/rejected: -714.5497
- Logps/chosen: -254.4278
- Logits/rejected: -3.3257
- Logits/chosen: -3.4722

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1608        | 0.03  | 100  | 0.1654          | 1.2374         | -2.6089          | 0.9571             | 3.8463          | -450.3222      | -243.4314    | -3.2204         | -3.2045       |
| 0.1349        | 0.07  | 200  | 0.0961          | 0.9406         | -6.3451          | 0.9756             | 7.2857          | -487.6837      | -246.3994    | -3.1898         | -3.2216       |
| 0.1065        | 0.1   | 300  | 0.1015          | -0.2203        | -9.2710          | 0.9840             | 9.0507          | -516.9434      | -258.0089    | -3.1999         | -3.2283       |
| 0.0876        | 0.14  | 400  | 0.0597          | -1.4412        | -13.6992         | 0.9865             | 12.2580         | -561.2250      | -270.2174    | -3.2066         | -3.2753       |
| 0.304         | 0.17  | 500  | 0.0874          | -0.2677        | -17.2497         | 0.9891             | 16.9821         | -596.7302      | -258.4822    | -3.2093         | -3.2601       |
| 0.1206        | 0.2   | 600  | 0.0686          | -0.4252        | -15.6514         | 0.9891             | 15.2262         | -580.7473      | -260.0578    | -3.1689         | -3.2024       |
| 0.0176        | 0.24  | 700  | 0.0630          | -0.7082        | -17.5291         | 0.9933             | 16.8209         | -599.5242      | -262.8876    | -3.2305         | -3.2958       |
| 0.0461        | 0.27  | 800  | 0.0341          | -1.2542        | -21.2558         | 0.9933             | 20.0016         | -636.7914      | -268.3477    | -3.3936         | -3.5158       |
| 0.0185        | 0.31  | 900  | 0.0291          | 0.3781         | -17.2475         | 0.9966             | 17.6256         | -596.7079      | -252.0242    | -3.3745         | -3.4941       |
| 0.0219        | 0.34  | 1000 | 0.0248          | -0.1014        | -19.6177         | 0.9958             | 19.5163         | -620.4097      | -256.8191    | -3.3236         | -3.4703       |
| 0.0193        | 0.37  | 1100 | 0.0476          | 0.2441         | -22.8685         | 0.9949             | 23.1126         | -652.9178      | -253.3648    | -3.3700         | -3.5127       |
| 0.0153        | 0.41  | 1200 | 0.0344          | 0.2337         | -21.0722         | 0.9958             | 21.3059         | -634.9553      | -253.4690    | -3.3281         | -3.4433       |
| 0.1011        | 0.44  | 1300 | 0.0320          | 0.3865         | -19.5099         | 0.9941             | 19.8964         | -619.3322      | -251.9406    | -3.2086         | -3.2943       |
| 0.0085        | 0.48  | 1400 | 0.0164          | -0.3604        | -24.6053         | 0.9958             | 24.2449         | -670.2856      | -259.4097    | -3.3688         | -3.5055       |
| 0.0057        | 0.51  | 1500 | 0.0115          | -0.8584        | -33.7853         | 0.9966             | 32.9269         | -762.0861      | -264.3898    | -3.2986         | -3.4455       |
| 0.0082        | 0.54  | 1600 | 0.0525          | -0.3661        | -22.4426         | 0.9975             | 22.0765         | -648.6592      | -259.4668    | -3.3372         | -3.4816       |
| 0.0128        | 0.58  | 1700 | 0.0514          | -0.4253        | -24.3063         | 0.9958             | 23.8810         | -667.2958      | -260.0584    | -3.3102         | -3.4488       |
| 0.0018        | 0.61  | 1800 | 0.0356          | -0.3563        | -24.1492         | 0.9966             | 23.7929         | -665.7247      | -259.3687    | -3.2894         | -3.4159       |
| 0.0105        | 0.65  | 1900 | 0.0381          | -0.9566        | -33.8957         | 0.9958             | 32.9391         | -763.1902      | -265.3718    | -3.3840         | -3.5348       |
| 0.006         | 0.68  | 2000 | 0.0072          | -0.1403        | -26.2483         | 0.9975             | 26.1080         | -686.7160      | -257.2083    | -3.3371         | -3.4805       |
| 0.0026        | 0.71  | 2100 | 0.0102          | -0.1870        | -29.0470         | 0.9966             | 28.8600         | -714.7033      | -257.6760    | -3.3557         | -3.4974       |
| 0.0038        | 0.75  | 2200 | 0.0078          | -0.4803        | -29.8773         | 0.9966             | 29.3970         | -723.0064      | -260.6087    | -3.3551         | -3.5046       |
| 0.0011        | 0.78  | 2300 | 0.0075          | -0.4771        | -28.4348         | 0.9966             | 27.9577         | -708.5814      | -260.5770    | -3.3459         | -3.4948       |
| 0.0033        | 0.82  | 2400 | 0.0047          | -0.1998        | -28.0030         | 0.9983             | 27.8032         | -704.2631      | -257.8039    | -3.3489         | -3.4950       |
| 0.0051        | 0.85  | 2500 | 0.0048          | -0.2771        | -29.2358         | 0.9992             | 28.9587         | -716.5906      | -258.5765    | -3.3025         | -3.4428       |
| 0.0074        | 0.88  | 2600 | 0.0044          | -0.2089        | -29.6486         | 0.9975             | 29.4396         | -720.7189      | -257.8950    | -3.3320         | -3.4805       |
| 0.0032        | 0.92  | 2700 | 0.0041          | -0.1675        | -30.1791         | 0.9975             | 30.0116         | -726.0242      | -257.4810    | -3.3308         | -3.4822       |
| 0.0023        | 0.95  | 2800 | 0.0038          | 0.0604         | -29.3907         | 0.9983             | 29.4511         | -718.1400      | -255.2013    | -3.3267         | -3.4751       |
| 0.003         | 0.99  | 2900 | 0.0040          | 0.1446         | -28.9793         | 0.9983             | 29.1239         | -714.0264      | -254.3596    | -3.3257         | -3.4723       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1