End of training

Browse files

Files changed (4) hide show

README.md +19 -19
logs/attn_loss_fn=raw_mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=4e-06/events.out.tfevents.1723761898.93d6cbb3ad53 +3 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 5102.7344
-- eval_frwikippl: 36133.4453
-- eval_zhwikippl: 51745.4414
-- eval_tinystoriesppl: 1872.7611
-- eval_loss: 17.3780
-- eval_runtime: 33.2598
-- eval_samples_per_second: 75.166
-- eval_steps_per_second: 9.411
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -47,7 +47,7 @@ More information needed
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None))
 - train_embeddings: True
-- learning_rate: 0.0004
 - train_batch_size: 16
 - eval_batch_size: 8
 - seed: 42
@@ -62,18 +62,18 @@ Peak GPU Memory: 16.2498 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
-| 0 | 0 | 27673.0234 | 72839.8984 | 19.1728 | 33.1089 | 75.508 | 9.454 | 16450.7617 | 62570.5625 |
-| 2000 | 0.1293 | 5104.3164 | 36174.1641 | 17.3784 | 33.2758 | 75.13 | 9.406 | 1874.0001 | 51745.4414 |
-| 4000 | 0.2586 | 5104.3164 | 36214.9648 | 17.3776 | 33.0549 | 75.632 | 9.469 | 1873.0709 | 51773.0352 |
-| 6000 | 0.3879 | 5104.3164 | 36133.4453 | 17.3784 | 33.0479 | 75.648 | 9.471 | 1874.3102 | 51717.8164 |
-| 8000 | 0.5172 | 5102.7344 | 36133.4453 | 17.3780 | 33.2598 | 75.166 | 9.411 | 1872.7611 | 51745.4414 |
-| 10000 | 0.6465 | 5105.8984 | 36133.4453 | 17.3784 | 33.2485 | 75.191 | 9.414 | 1875.8596 | 51745.4414 |
-| 12000 | 0.7757 | 5105.8984 | 36133.4453 | 17.3780 | 33.0133 | 75.727 | 9.481 | 1874.9297 | 51717.8164 |
-| 14000 | 0.9050 | 5104.3164 | 36214.9648 | 17.3776 | 33.2272 | 75.24 | 9.42 | 1872.7611 | 51745.4414 |
-| 15469 | 1.0 | 5104.3164 | 36133.4453 | 17.3784 | 33.002 | 75.753 | 9.484 | 1874.3102 | 51745.4414 |
 ### Framework versions
 - Distily 0.2.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
-- Datasets 2.21.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 12623.7412
+- eval_frwikippl: 52327.9961
+- eval_zhwikippl: 96451.4609
+- eval_tinystoriesppl: 6281.7222
+- eval_loss: 18.2212
+- eval_runtime: 32.9567
+- eval_samples_per_second: 75.857
+- eval_steps_per_second: 9.497
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 The following hyperparameters were used during training:
 - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None))
 - train_embeddings: True
+- learning_rate: 4e-06
 - train_batch_size: 16
 - eval_batch_size: 8
 - seed: 42
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 169.9865 | 47377.9414 |  |  |  |  | 3.9789 | 4998.1294 |
+| 0 | 0 | 12930.5508 | 52883.7109 | 18.2508 | 32.8136 | 76.188 | 9.539 | 6471.4722 | 96760.75 |
+| 2000 | 0.1293 | 12623.7412 | 52327.9961 | 18.2212 | 32.8936 | 76.003 | 9.516 | 6281.7222 | 96451.4609 |
+| 4000 | 0.2586 | 12623.7412 | 52327.9961 | 18.2212 | 32.8034 | 76.212 | 9.542 | 6281.7222 | 96451.4609 |
+| 6000 | 0.3879 | 12623.7412 | 52327.9961 | 18.2212 | 32.8868 | 76.018 | 9.517 | 6281.7222 | 96451.4609 |
+| 8000 | 0.5172 | 12623.7412 | 52327.9961 | 18.2212 | 32.9567 | 75.857 | 9.497 | 6281.7222 | 96451.4609 |
+| 10000 | 0.6465 | 12623.7412 | 52327.9961 | 18.2212 | 32.9256 | 75.929 | 9.506 | 6281.7222 | 96451.4609 |
+| 12000 | 0.7757 | 12623.7412 | 52327.9961 | 18.2212 | 32.9173 | 75.948 | 9.509 | 6281.7222 | 96451.4609 |
+| 14000 | 0.9050 | 12623.7412 | 52327.9961 | 18.2212 | 33.1691 | 75.371 | 9.437 | 6281.7222 | 96451.4609 |
+| 15469 | 1.0 | 12623.7412 | 52327.9961 | 18.2212 | 33.0829 | 75.568 | 9.461 | 6281.7222 | 96451.4609 |
 ### Framework versions
 - Distily 0.2.0
 - Transformers 4.44.0
 - Pytorch 2.3.0
+- Datasets 2.20.0

logs/attn_loss_fn=raw_mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=4e-06/events.out.tfevents.1723761898.93d6cbb3ad53 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e29c5c31674ab017aaa74d55b1626f675e08ae2fb3ba79dedcefb5c425a6e77
+size 307

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9a0037bfef9c17d14691962406b9b11b0d5130f3a9d367f51439ea4e62aaf7eb
 size 137033984

 version https://git-lfs.github.com/spec/v1
+oid sha256:6a64218e3d2c30d196c07cc994eaf051554a03e0f07ebd8cd8baad5a362b72d2
 size 137033984

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d0cf10762272cd45fe15f224915d53e044fc3890453ec67ac6ce663addee22c5
 size 1017948104

 version https://git-lfs.github.com/spec/v1
+oid sha256:8d665b5b0ccf6b5a8d166fee5cb77c0bd9dc46300ff235f1d51f2fbe7828aef5
 size 1017948104