jikaixuan commited on
Commit
f53b622
1 Parent(s): b167fe5

Model save

Browse files
README.md CHANGED
@@ -1,14 +1,10 @@
1
  ---
2
- license: apache-2.0
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - dpo
8
  - generated_from_trainer
9
  base_model: mistralai/Mistral-7B-v0.1
10
- datasets:
11
- - HuggingFaceH4/ultrafeedback_binarized
12
  model-index:
13
  - name: zephyr-7b
14
  results: []
@@ -19,19 +15,19 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # zephyr-7b
21
 
22
- This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 0.3622
25
- - Rewards/chosen: -150.6005
26
- - Rewards/rejected: -146.2936
27
- - Rewards/accuracies: 0.2421
28
- - Rewards/margins: -4.3069
29
- - Logps/rejected: -14704.7598
30
- - Logps/chosen: -15128.9541
31
- - Logits/rejected: 13.5278
32
- - Logits/chosen: 13.4631
33
- - Use Label: 11931.9844
34
- - Pred Label: 8140.0161
35
 
36
  ## Model description
37
 
@@ -68,15 +64,15 @@ The following hyperparameters were used during training:
68
 
69
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Use Label | Pred Label |
70
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:----------:|:----------:|
71
- | 0.6637 | 0.1 | 100 | 0.6642 | -0.0947 | -0.1635 | 0.3254 | 0.0687 | -91.7446 | -78.3734 | -2.0927 | -2.1253 | 1838.9207 | 17.0794 |
72
- | 0.3902 | 0.21 | 200 | 0.3930 | -14.4219 | -14.0352 | 0.2560 | -0.3866 | -1478.9202 | -1511.0870 | 2.8471 | 2.7727 | 3444.6985 | 515.3016 |
73
- | 0.3845 | 0.31 | 300 | 0.3786 | -23.0869 | -24.5685 | 0.2520 | 1.4817 | -2532.2498 | -2377.5872 | 5.4283 | 5.3070 | 4579.4922 | 1484.5079 |
74
- | 0.3477 | 0.42 | 400 | 0.3622 | -111.3259 | -109.5294 | 0.25 | -1.7965 | -11028.3408 | -11201.4893 | 11.6816 | 11.5716 | 5682.4922 | 2485.5081 |
75
- | 0.3468 | 0.52 | 500 | 0.3613 | -144.7782 | -140.7408 | 0.2421 | -4.0373 | -14149.4824 | -14546.7158 | 13.8885 | 13.8347 | 6784.2383 | 3487.7620 |
76
- | 0.33 | 0.63 | 600 | 0.3605 | -143.0167 | -138.8336 | 0.2401 | -4.1831 | -13958.7627 | -14370.5693 | 12.5943 | 12.5399 | 7857.4287 | 4518.5713 |
77
- | 0.3665 | 0.73 | 700 | 0.3614 | -150.1877 | -145.8865 | 0.2421 | -4.3011 | -14664.0518 | -15087.6680 | 13.4024 | 13.3367 | 8936.4287 | 5543.5713 |
78
- | 0.3731 | 0.84 | 800 | 0.3623 | -150.4385 | -146.1303 | 0.2401 | -4.3082 | -14688.4258 | -15112.7539 | 13.5339 | 13.4696 | 10050.3330 | 6533.6665 |
79
- | 0.3696 | 0.94 | 900 | 0.3625 | -150.6127 | -146.3050 | 0.2421 | -4.3077 | -14705.8975 | -15130.1680 | 13.5362 | 13.4716 | 11165.9844 | 7522.0161 |
80
 
81
 
82
  ### Framework versions
 
1
  ---
 
2
  library_name: peft
3
  tags:
 
4
  - trl
5
  - dpo
6
  - generated_from_trainer
7
  base_model: mistralai/Mistral-7B-v0.1
 
 
8
  model-index:
9
  - name: zephyr-7b
10
  results: []
 
15
 
16
  # zephyr-7b
17
 
18
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 0.6790
21
+ - Rewards/chosen: -0.5476
22
+ - Rewards/rejected: -0.8618
23
+ - Rewards/accuracies: 0.3571
24
+ - Rewards/margins: 0.3143
25
+ - Logps/rejected: -161.5806
26
+ - Logps/chosen: -123.6563
27
+ - Logits/rejected: 1.4905
28
+ - Logits/chosen: 1.3693
29
+ - Use Label: 16436.9844
30
+ - Pred Label: 2251.0159
31
 
32
  ## Model description
33
 
 
64
 
65
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Use Label | Pred Label |
66
  |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:----------:|:----------:|
67
+ | 0.6685 | 0.1 | 100 | 0.6684 | -0.0306 | -0.0936 | 0.3353 | 0.0631 | -84.7626 | -71.9572 | -2.0796 | -2.1097 | 1856.0 | 0.0 |
68
+ | 0.676 | 0.21 | 200 | 0.6717 | -0.3729 | -0.4956 | 0.3214 | 0.1227 | -124.9563 | -106.1906 | -1.6889 | -1.7319 | 3898.4443 | 61.5556 |
69
+ | 0.6728 | 0.31 | 300 | 0.6784 | -0.4712 | -0.7059 | 0.3373 | 0.2347 | -145.9853 | -116.0199 | -0.6762 | -0.7414 | 5793.0317 | 270.9683 |
70
+ | 0.6715 | 0.42 | 400 | 0.6812 | -0.4462 | -0.7352 | 0.3552 | 0.2890 | -148.9146 | -113.5210 | 0.7648 | 0.6420 | 7595.3174 | 572.6826 |
71
+ | 0.6744 | 0.52 | 500 | 0.6722 | -0.5121 | -0.7576 | 0.3413 | 0.2455 | -151.1573 | -120.1133 | 0.7128 | 0.6149 | 9378.1592 | 893.8412 |
72
+ | 0.6784 | 0.63 | 600 | 0.6792 | -0.5107 | -0.8136 | 0.3512 | 0.3028 | -156.7531 | -119.9755 | 0.9939 | 0.8860 | 11169.8096 | 1206.1904 |
73
+ | 0.6783 | 0.73 | 700 | 0.6756 | -0.6634 | -0.9598 | 0.3671 | 0.2964 | -171.3761 | -135.2395 | 1.2995 | 1.1927 | 12921.4766 | 1558.5238 |
74
+ | 0.6776 | 0.84 | 800 | 0.6801 | -0.5500 | -0.8628 | 0.3532 | 0.3128 | -161.6791 | -123.9010 | 1.4789 | 1.3586 | 14683.5078 | 1900.4921 |
75
+ | 0.6751 | 0.94 | 900 | 0.6790 | -0.5476 | -0.8618 | 0.3571 | 0.3143 | -161.5806 | -123.6563 | 1.4905 | 1.3693 | 16436.9844 | 2251.0159 |
76
 
77
 
78
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2e4a330a096e69a5e1aa2e441770bbbdea02197b49faf844e6ad2a927051782f
3
  size 671150064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad71bba8c0fb6b1ad969d4d566283487ddd281fbcd29b2a43facead523b5051b
3
  size 671150064
all_results.json CHANGED
@@ -1,23 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "eval_logits/chosen": 13.463111877441406,
4
- "eval_logits/rejected": 13.527791023254395,
5
- "eval_logps/chosen": -15128.9541015625,
6
- "eval_logps/rejected": -14704.759765625,
7
- "eval_loss": 0.3621964752674103,
8
- "eval_pred_label": 8140.01611328125,
9
- "eval_rewards/accuracies": 0.2420634925365448,
10
- "eval_rewards/chosen": -150.60052490234375,
11
- "eval_rewards/margins": -4.306910991668701,
12
- "eval_rewards/rejected": -146.29360961914062,
13
- "eval_runtime": 245.5331,
14
- "eval_samples": 2000,
15
- "eval_samples_per_second": 8.146,
16
- "eval_steps_per_second": 0.257,
17
- "eval_use_label": 11931.984375,
18
- "train_loss": 0.4203558097959189,
19
- "train_runtime": 19955.1733,
20
  "train_samples": 61135,
21
- "train_samples_per_second": 3.064,
22
  "train_steps_per_second": 0.048
23
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.6760230718482851,
4
+ "train_runtime": 20063.9235,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "train_samples": 61135,
6
+ "train_samples_per_second": 3.047,
7
  "train_steps_per_second": 0.048
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 1.0,
3
- "train_loss": 0.4203558097959189,
4
- "train_runtime": 19955.1733,
5
  "train_samples": 61135,
6
- "train_samples_per_second": 3.064,
7
  "train_steps_per_second": 0.048
8
  }
 
1
  {
2
  "epoch": 1.0,
3
+ "train_loss": 0.6760230718482851,
4
+ "train_runtime": 20063.9235,
5
  "train_samples": 61135,
6
+ "train_samples_per_second": 3.047,
7
  "train_steps_per_second": 0.048
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff