Model save

Browse files

Files changed (5) hide show

README.md +21 -25
adapter_model.safetensors +1 -1
all_results.json +3 -18
train_results.json +3 -3
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -1,14 +1,10 @@
 ---
-license: apache-2.0
 library_name: peft
 tags:
-- alignment-handbook
 - trl
 - dpo
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
-datasets:
-- HuggingFaceH4/ultrafeedback_binarized
 model-index:
 - name: zephyr-7b
   results: []
@@ -19,19 +15,19 @@ should probably proofread and complete it, then remove this comment. -->
 # zephyr-7b
-This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-qlora](https://huggingface.co/alignment-handbook/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3622
-- Rewards/chosen: -150.6005
-- Rewards/rejected: -146.2936
-- Rewards/accuracies: 0.2421
-- Rewards/margins: -4.3069
-- Logps/rejected: -14704.7598
-- Logps/chosen: -15128.9541
-- Logits/rejected: 13.5278
-- Logits/chosen: 13.4631
-- Use Label: 11931.9844
-- Pred Label: 8140.0161
 ## Model description
@@ -68,15 +64,15 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Use Label  | Pred Label |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:----------:|:----------:|
-| 0.6637        | 0.1   | 100  | 0.6642          | -0.0947        | -0.1635          | 0.3254             | 0.0687          | -91.7446       | -78.3734     | -2.0927         | -2.1253       | 1838.9207  | 17.0794    |
-| 0.3902        | 0.21  | 200  | 0.3930          | -14.4219       | -14.0352         | 0.2560             | -0.3866         | -1478.9202     | -1511.0870   | 2.8471          | 2.7727        | 3444.6985  | 515.3016   |
-| 0.3845        | 0.31  | 300  | 0.3786          | -23.0869       | -24.5685         | 0.2520             | 1.4817          | -2532.2498     | -2377.5872   | 5.4283          | 5.3070        | 4579.4922  | 1484.5079  |
-| 0.3477        | 0.42  | 400  | 0.3622          | -111.3259      | -109.5294        | 0.25               | -1.7965         | -11028.3408    | -11201.4893  | 11.6816         | 11.5716       | 5682.4922  | 2485.5081  |
-| 0.3468        | 0.52  | 500  | 0.3613          | -144.7782      | -140.7408        | 0.2421             | -4.0373         | -14149.4824    | -14546.7158  | 13.8885         | 13.8347       | 6784.2383  | 3487.7620  |
-| 0.33          | 0.63  | 600  | 0.3605          | -143.0167      | -138.8336        | 0.2401             | -4.1831         | -13958.7627    | -14370.5693  | 12.5943         | 12.5399       | 7857.4287  | 4518.5713  |
-| 0.3665        | 0.73  | 700  | 0.3614          | -150.1877      | -145.8865        | 0.2421             | -4.3011         | -14664.0518    | -15087.6680  | 13.4024         | 13.3367       | 8936.4287  | 5543.5713  |
-| 0.3731        | 0.84  | 800  | 0.3623          | -150.4385      | -146.1303        | 0.2401             | -4.3082         | -14688.4258    | -15112.7539  | 13.5339         | 13.4696       | 10050.3330 | 6533.6665  |
-| 0.3696        | 0.94  | 900  | 0.3625          | -150.6127      | -146.3050        | 0.2421             | -4.3077         | -14705.8975    | -15130.1680  | 13.5362         | 13.4716       | 11165.9844 | 7522.0161  |
 ### Framework versions

 ---
 library_name: peft
 tags:
 - trl
 - dpo
 - generated_from_trainer
 base_model: mistralai/Mistral-7B-v0.1
 model-index:
 - name: zephyr-7b
   results: []
 # zephyr-7b
+This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6790
+- Rewards/chosen: -0.5476
+- Rewards/rejected: -0.8618
+- Rewards/accuracies: 0.3571
+- Rewards/margins: 0.3143
+- Logps/rejected: -161.5806
+- Logps/chosen: -123.6563
+- Logits/rejected: 1.4905
+- Logits/chosen: 1.3693
+- Use Label: 16436.9844
+- Pred Label: 2251.0159
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Use Label  | Pred Label |
 |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:----------:|:----------:|
+| 0.6685        | 0.1   | 100  | 0.6684          | -0.0306        | -0.0936          | 0.3353             | 0.0631          | -84.7626       | -71.9572     | -2.0796         | -2.1097       | 1856.0     | 0.0        |
+| 0.676         | 0.21  | 200  | 0.6717          | -0.3729        | -0.4956          | 0.3214             | 0.1227          | -124.9563      | -106.1906    | -1.6889         | -1.7319       | 3898.4443  | 61.5556    |
+| 0.6728        | 0.31  | 300  | 0.6784          | -0.4712        | -0.7059          | 0.3373             | 0.2347          | -145.9853      | -116.0199    | -0.6762         | -0.7414       | 5793.0317  | 270.9683   |
+| 0.6715        | 0.42  | 400  | 0.6812          | -0.4462        | -0.7352          | 0.3552             | 0.2890          | -148.9146      | -113.5210    | 0.7648          | 0.6420        | 7595.3174  | 572.6826   |
+| 0.6744        | 0.52  | 500  | 0.6722          | -0.5121        | -0.7576          | 0.3413             | 0.2455          | -151.1573      | -120.1133    | 0.7128          | 0.6149        | 9378.1592  | 893.8412   |
+| 0.6784        | 0.63  | 600  | 0.6792          | -0.5107        | -0.8136          | 0.3512             | 0.3028          | -156.7531      | -119.9755    | 0.9939          | 0.8860        | 11169.8096 | 1206.1904  |
+| 0.6783        | 0.73  | 700  | 0.6756          | -0.6634        | -0.9598          | 0.3671             | 0.2964          | -171.3761      | -135.2395    | 1.2995          | 1.1927        | 12921.4766 | 1558.5238  |
+| 0.6776        | 0.84  | 800  | 0.6801          | -0.5500        | -0.8628          | 0.3532             | 0.3128          | -161.6791      | -123.9010    | 1.4789          | 1.3586        | 14683.5078 | 1900.4921  |
+| 0.6751        | 0.94  | 900  | 0.6790          | -0.5476        | -0.8618          | 0.3571             | 0.3143          | -161.5806      | -123.6563    | 1.4905          | 1.3693        | 16436.9844 | 2251.0159  |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2e4a330a096e69a5e1aa2e441770bbbdea02197b49faf844e6ad2a927051782f
 size 671150064

 version https://git-lfs.github.com/spec/v1
+oid sha256:ad71bba8c0fb6b1ad969d4d566283487ddd281fbcd29b2a43facead523b5051b
 size 671150064

all_results.json CHANGED Viewed

@@ -1,23 +1,8 @@
 {
     "epoch": 1.0,
-    "eval_logits/chosen": 13.463111877441406,
-    "eval_logits/rejected": 13.527791023254395,
-    "eval_logps/chosen": -15128.9541015625,
-    "eval_logps/rejected": -14704.759765625,
-    "eval_loss": 0.3621964752674103,
-    "eval_pred_label": 8140.01611328125,
-    "eval_rewards/accuracies": 0.2420634925365448,
-    "eval_rewards/chosen": -150.60052490234375,
-    "eval_rewards/margins": -4.306910991668701,
-    "eval_rewards/rejected": -146.29360961914062,
-    "eval_runtime": 245.5331,
-    "eval_samples": 2000,
-    "eval_samples_per_second": 8.146,
-    "eval_steps_per_second": 0.257,
-    "eval_use_label": 11931.984375,
-    "train_loss": 0.4203558097959189,
-    "train_runtime": 19955.1733,
     "train_samples": 61135,
-    "train_samples_per_second": 3.064,
     "train_steps_per_second": 0.048
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.6760230718482851,
+    "train_runtime": 20063.9235,
     "train_samples": 61135,
+    "train_samples_per_second": 3.047,
     "train_steps_per_second": 0.048
 }

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
     "epoch": 1.0,
-    "train_loss": 0.4203558097959189,
-    "train_runtime": 19955.1733,
     "train_samples": 61135,
-    "train_samples_per_second": 3.064,
     "train_steps_per_second": 0.048
 }

 {
     "epoch": 1.0,
+    "train_loss": 0.6760230718482851,
+    "train_runtime": 20063.9235,
     "train_samples": 61135,
+    "train_samples_per_second": 3.047,
     "train_steps_per_second": 0.048
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff