End of training

Browse files

Files changed (2) hide show

README.md +32 -31
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.0004, warmup_ratio=0.1/events.out.tfevents.1723786875.b7d545513dcf +3 -0

README.md CHANGED Viewed

@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 3694.4192
-- eval_frwikippl: 30929.5703
-- eval_zhwikippl: 45501.2617
-- eval_tinystoriesppl: 1160.7031
-- eval_loss: 15.6337
-- eval_runtime: 66.5963
-- eval_samples_per_second: 75.079
-- eval_steps_per_second: 9.385
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -53,37 +53,38 @@ The following hyperparameters were used during training:
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant
 - num_epochs: 1.0
 ### Resource Usage
-Peak GPU Memory: 8.2666 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
-| 0 | 0 | 19278.7617 | 60268.5703 | 17.3716 | 66.6062 | 75.068 | 9.384 | 9660.0908 | 53858.2383 |
-| 3000 | 0.0485 | 3702.4434 | 30929.5703 | 15.6332 | 66.4884 | 75.201 | 9.4 | 1163.3928 | 45525.5703 |
-| 6000 | 0.0970 | 3702.4434 | 30929.5703 | 15.6346 | 67.0021 | 74.625 | 9.328 | 1163.0084 | 45525.5703 |
-| 9000 | 0.1455 | 3694.4192 | 30929.5703 | 15.6337 | 66.5963 | 75.079 | 9.385 | 1160.7031 | 45501.2617 |
-| 12000 | 0.1939 | 3696.7100 | 30981.8828 | 15.6348 | 66.5218 | 75.163 | 9.395 | 1161.0868 | 45525.5703 |
-| 15000 | 0.2424 | 3697.8560 | 30929.5703 | 15.6342 | 66.6948 | 74.968 | 9.371 | 1161.8550 | 45525.5703 |
-| 18000 | 0.2909 | 3696.7100 | 30981.8828 | 15.6348 | 66.2258 | 75.499 | 9.437 | 1161.2789 | 45525.5703 |
-| 21000 | 0.3394 | 3697.8560 | 30946.9785 | 15.6344 | 66.5835 | 75.094 | 9.387 | 1161.4711 | 45525.5703 |
-| 24000 | 0.3879 | 3697.8560 | 30929.5703 | 15.6334 | 66.8279 | 74.819 | 9.352 | 1162.0472 | 45525.5703 |
-| 27000 | 0.4364 | 3697.8560 | 30981.8828 | 15.6346 | 66.5691 | 75.11 | 9.389 | 1161.6627 | 45525.5703 |
-| 30000 | 0.4848 | 3696.7100 | 30946.9785 | 15.6346 | 66.7012 | 74.961 | 9.37 | 1160.7031 | 45525.5703 |
-| 33000 | 0.5333 | 3696.1389 | 30981.8828 | 15.6346 | 66.5211 | 75.164 | 9.396 | 1160.1277 | 45525.5703 |
-| 36000 | 0.5818 | 3700.1489 | 30929.5703 | 15.6331 | 66.5006 | 75.187 | 9.398 | 1162.6237 | 45525.5703 |
-| 39000 | 0.6303 | 3694.4192 | 30964.4258 | 15.6344 | 66.3802 | 75.324 | 9.415 | 1160.5111 | 45501.2617 |
-| 42000 | 0.6788 | 3696.7100 | 30946.9785 | 15.6346 | 66.6702 | 74.996 | 9.375 | 1160.7031 | 45525.5703 |
-| 45000 | 0.7273 | 3696.7100 | 30981.8828 | 15.6347 | 66.7768 | 74.876 | 9.36 | 1161.0868 | 45525.5703 |
-| 48000 | 0.7758 | 3694.4192 | 30929.5703 | 15.6331 | 66.6573 | 75.011 | 9.376 | 1160.7031 | 45525.5703 |
-| 51000 | 0.8242 | 3692.7039 | 30981.8828 | 15.6344 | 66.8297 | 74.817 | 9.352 | 1159.7439 | 45501.2617 |
-| 54000 | 0.8727 | 3692.1333 | 30946.9785 | 15.6344 | 66.8788 | 74.762 | 9.345 | 1158.7859 | 45501.2617 |
-| 57000 | 0.9212 | 3696.7100 | 30946.9785 | 15.6346 | 66.8322 | 74.814 | 9.352 | 1160.7031 | 45501.2617 |
-| 60000 | 0.9697 | 3707.0330 | 30929.5703 | 15.6328 | 66.9377 | 74.696 | 9.337 | 1165.3177 | 45525.5703 |
-| 61875 | 1.0 | 3702.4434 | 30929.5703 | 15.6331 | 66.5826 | 75.095 | 9.387 | 1163.3928 | 45501.2617 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 3944.9531
+- eval_frwikippl: 30197.7344
+- eval_zhwikippl: 52496.3438
+- eval_tinystoriesppl: 1385.5492
+- eval_loss: 16.6107
+- eval_runtime: 66.9937
+- eval_samples_per_second: 74.634
+- eval_steps_per_second: 9.329
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant
+- lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1.0
 ### Resource Usage
+Peak GPU Memory: 8.2677 GB
 ### Eval-Phase Metrics
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
+| 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 67.1093 | 74.505 | 9.313 | 12321.8145 | 60955.8008 |
+| 3000 | 0.0485 | 3940.0654 | 30197.7344 | 16.6107 | 67.006 | 74.62 | 9.328 | 1383.7178 | 52496.3438 |
+| 6000 | 0.0970 | 3937.6238 | 30180.7188 | 16.6119 | 67.1095 | 74.505 | 9.313 | 1383.9467 | 52496.3438 |
+| 9000 | 0.1455 | 3944.9531 | 30197.7344 | 16.6107 | 66.9937 | 74.634 | 9.329 | 1385.5492 | 52496.3438 |
+| 12000 | 0.1939 | 3944.9531 | 30197.7344 | 16.6115 | 67.0666 | 74.553 | 9.319 | 1384.8617 | 52496.3438 |
+| 15000 | 0.2424 | 3937.6238 | 30180.7188 | 16.6121 | 66.6143 | 75.059 | 9.382 | 1385.3201 | 52524.3359 |
+| 18000 | 0.2909 | 3937.6238 | 30180.7188 | 16.6117 | 66.7575 | 74.898 | 9.362 | 1384.1750 | 52552.3945 |
+| 21000 | 0.3394 | 3944.9531 | 30197.7344 | 16.6115 | 66.679 | 74.986 | 9.373 | 1384.4041 | 52496.3438 |
+| 24000 | 0.3879 | 3944.9531 | 30197.7344 | 16.6121 | 66.8908 | 74.749 | 9.344 | 1384.4041 | 52468.3164 |
+| 27000 | 0.4364 | 3942.5085 | 30197.7344 | 16.6117 | 66.4311 | 75.266 | 9.408 | 1383.0317 | 52496.3438 |
+| 30000 | 0.4848 | 3940.0654 | 30180.7188 | 16.6107 | 66.4762 | 75.215 | 9.402 | 1383.2599 | 52496.3438 |
+| 33000 | 0.5333 | 3937.6238 | 30197.7344 | 16.6107 | 66.4814 | 75.209 | 9.401 | 1382.8029 | 52496.3438 |
+| 36000 | 0.5818 | 3942.5085 | 30180.7188 | 16.6111 | 67.3001 | 74.294 | 9.287 | 1385.3201 | 52496.3438 |
+| 39000 | 0.6303 | 3937.6238 | 30180.7188 | 16.6115 | 67.0065 | 74.62 | 9.327 | 1383.4888 | 52496.3438 |
+| 42000 | 0.6788 | 3942.5085 | 30197.7344 | 16.6109 | 66.7444 | 74.913 | 9.364 | 1384.1750 | 52496.3438 |
+| 45000 | 0.7273 | 3941.2869 | 30197.7344 | 16.6115 | 67.1516 | 74.458 | 9.307 | 1382.8029 | 52496.3438 |
+| 48000 | 0.7758 | 3944.9531 | 30180.7188 | 16.6107 | 66.7762 | 74.877 | 9.36 | 1386.6947 | 52524.3359 |
+| 51000 | 0.8242 | 3942.5085 | 30197.7344 | 16.6111 | 67.2623 | 74.336 | 9.292 | 1384.8617 | 52496.3438 |
+| 54000 | 0.8727 | 3944.9531 | 30180.7188 | 16.6107 | 66.724 | 74.936 | 9.367 | 1385.3201 | 52496.3438 |
+| 57000 | 0.9212 | 3941.2869 | 30197.7344 | 16.6115 | 67.0602 | 74.56 | 9.32 | 1382.8029 | 52468.3164 |
+| 60000 | 0.9697 | 3942.5085 | 30197.7344 | 16.6119 | 67.4137 | 74.169 | 9.271 | 1382.8029 | 52468.3164 |
+| 61875 | 1.0 | 3937.6238 | 30180.7188 | 16.6119 | 67.1794 | 74.428 | 9.303 | 1383.7178 | 52496.3438 |
 ### Framework versions
 - Distily 0.2.0

logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.0004, warmup_ratio=0.1/events.out.tfevents.1723786875.b7d545513dcf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:782c562564518d47e058bdaa129d6c1c4e17cde558c9f851466f3d160248aacf
+size 312