Model save

Browse files

Files changed (4) hide show

README.md +18 -18
all_results.json +3 -3
train_results.json +3 -3
trainer_state.json +0 -0

README.md CHANGED Viewed

@@ -2,9 +2,9 @@
 base_model: princeton-nlp/Llama-3-Base-8B-SFT
 library_name: peft
 tags:
-- alignment-handbook
 - trl
 - dpo
 - generated_from_trainer
 model-index:
 - name: llama3-wpo-lora
@@ -18,16 +18,16 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.5185
-- Rewards/chosen: -0.0112
-- Rewards/rejected: -0.8516
 - Rewards/accuracies: 0.7280
-- Rewards/margins: 0.8404
-- Logps/rejected: -285.1924
-- Logps/chosen: -292.6592
 - Logps/ref Response: -0.5364
-- Logits/rejected: -0.3464
-- Logits/chosen: -0.3802
 ## Model description
@@ -64,15 +64,15 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Response | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:---------------:|:-------------:|
-| 0.6207        | 0.1047 | 100  | 0.6043          | 0.2394         | -0.0685          | 0.6860             | 0.3079          | -277.3613      | -290.1534    | -0.5364            | -0.5544         | -0.5590       |
-| 0.5646        | 0.2094 | 200  | 0.5509          | -0.0903        | -0.6915          | 0.7220             | 0.6013          | -283.5920      | -293.4499    | -0.5364            | -0.4966         | -0.5149       |
-| 0.5411        | 0.3141 | 300  | 0.5362          | -0.1526        | -0.8607          | 0.7280             | 0.7082          | -285.2837      | -294.0728    | -0.5364            | -0.4552         | -0.4787       |
-| 0.5103        | 0.4187 | 400  | 0.5296          | -0.1296        | -0.9152          | 0.7160             | 0.7856          | -285.8287      | -293.8433    | -0.5364            | -0.4009         | -0.4304       |
-| 0.5408        | 0.5234 | 500  | 0.5238          | -0.0898        | -0.9022          | 0.7280             | 0.8124          | -285.6985      | -293.4456    | -0.5364            | -0.3717         | -0.4037       |
-| 0.5216        | 0.6281 | 600  | 0.5218          | 0.0221         | -0.7992          | 0.7380             | 0.8213          | -284.6689      | -292.3262    | -0.5364            | -0.3629         | -0.3958       |
-| 0.509         | 0.7328 | 700  | 0.5174          | -0.1835        | -1.0191          | 0.7300             | 0.8356          | -286.8675      | -294.3820    | -0.5364            | -0.3421         | -0.3765       |
-| 0.5306        | 0.8375 | 800  | 0.5181          | -0.0066        | -0.8414          | 0.7340             | 0.8348          | -285.0904      | -292.6132    | -0.5364            | -0.3462         | -0.3803       |
-| 0.4879        | 0.9422 | 900  | 0.5180          | -0.0159        | -0.8503          | 0.7260             | 0.8343          | -285.1792      | -292.7065    | -0.5364            | -0.3457         | -0.3797       |
 ### Framework versions

 base_model: princeton-nlp/Llama-3-Base-8B-SFT
 library_name: peft
 tags:
 - trl
 - dpo
+- alignment-handbook
 - generated_from_trainer
 model-index:
 - name: llama3-wpo-lora
 This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5175
+- Rewards/chosen: -0.0517
+- Rewards/rejected: -0.9092
 - Rewards/accuracies: 0.7280
+- Rewards/margins: 0.8575
+- Logps/rejected: -285.7691
+- Logps/chosen: -293.0645
 - Logps/ref Response: -0.5364
+- Logits/rejected: -0.3072
+- Logits/chosen: -0.3443
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Response | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:---------------:|:-------------:|
+| 0.6142        | 0.1047 | 100  | 0.5973          | 0.2024         | -0.1309          | 0.7020             | 0.3333          | -277.9861      | -290.5232    | -0.5364            | -0.5487         | -0.5543       |
+| 0.5579        | 0.2094 | 200  | 0.5483          | -0.0751        | -0.7065          | 0.7120             | 0.6313          | -283.7411      | -293.2985    | -0.5364            | -0.4847         | -0.5042       |
+| 0.5402        | 0.3141 | 300  | 0.5354          | -0.1318        | -0.8578          | 0.7260             | 0.7260          | -285.2545      | -293.8653    | -0.5364            | -0.4387         | -0.4637       |
+| 0.5112        | 0.4187 | 400  | 0.5277          | -0.1698        | -0.9670          | 0.7220             | 0.7973          | -286.3469      | -294.2450    | -0.5364            | -0.3715         | -0.4030       |
+| 0.5319        | 0.5234 | 500  | 0.5212          | -0.1546        | -0.9783          | 0.7260             | 0.8237          | -286.4595      | -294.0932    | -0.5364            | -0.3377         | -0.3727       |
+| 0.5155        | 0.6281 | 600  | 0.5195          | -0.0851        | -0.9285          | 0.7360             | 0.8434          | -285.9612      | -293.3980    | -0.5364            | -0.3247         | -0.3608       |
+| 0.5113        | 0.7328 | 700  | 0.5173          | -0.1941        | -1.0489          | 0.7340             | 0.8547          | -287.1652      | -294.4885    | -0.5364            | -0.3036         | -0.3411       |
+| 0.5268        | 0.8375 | 800  | 0.5177          | -0.0457        | -0.9023          | 0.7220             | 0.8566          | -285.7000      | -293.0044    | -0.5364            | -0.3082         | -0.3453       |
+| 0.4923        | 0.9422 | 900  | 0.5175          | -0.0517        | -0.9092          | 0.7280             | 0.8575          | -285.7691      | -293.0645    | -0.5364            | -0.3072         | -0.3443       |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -15,9 +15,9 @@
     "eval_samples_per_second": 5.732,
     "eval_steps_per_second": 0.358,
     "total_flos": 0.0,
-    "train_loss": 0.540449628405546,
-    "train_runtime": 19116.7324,
     "train_samples": 61135,
-    "train_samples_per_second": 3.198,
     "train_steps_per_second": 0.05
 }

     "eval_samples_per_second": 5.732,
     "eval_steps_per_second": 0.358,
     "total_flos": 0.0,
+    "train_loss": 0.5383632463934533,
+    "train_runtime": 19109.9128,
     "train_samples": 61135,
+    "train_samples_per_second": 3.199,
     "train_steps_per_second": 0.05
 }

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 0.9997382884061764,
     "total_flos": 0.0,
-    "train_loss": 0.540449628405546,
-    "train_runtime": 19116.7324,
     "train_samples": 61135,
-    "train_samples_per_second": 3.198,
     "train_steps_per_second": 0.05
 }

 {
     "epoch": 0.9997382884061764,
     "total_flos": 0.0,
+    "train_loss": 0.5383632463934533,
+    "train_runtime": 19109.9128,
     "train_samples": 61135,
+    "train_samples_per_second": 3.199,
     "train_steps_per_second": 0.05
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff