lapp0 commited on
Commit
a84fd15
1 Parent(s): 164d45f

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 76973.75
19
- - eval_frwikippl: 137158.1406
20
- - eval_zhwikippl: 113740.5234
21
- - eval_tinystoriesppl: 59510.8320
22
- - eval_loss: 31.1030
23
- - eval_runtime: 11.5029
24
- - eval_samples_per_second: 86.934
25
- - eval_steps_per_second: 10.867
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,7 +47,7 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=0, loss_fn=None, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10, loss_fn=kl, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 0.001
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
@@ -62,32 +62,32 @@ Peak GPU Memory: 6.6287 GB
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 88697.0156 | 150478.2188 | 32.2330 | 11.5105 | 86.877 | 10.86 | 69390.6016 | 113346.8047 |
66
- | 500 | 0.0404 | 75112.3906 | 135544.8125 | 31.1030 | 11.5037 | 86.929 | 10.866 | 58178.0234 | 112593.2969 |
67
- | 1000 | 0.0808 | 77728.7578 | 137816.4844 | 31.1030 | 11.5074 | 86.901 | 10.863 | 60273.2852 | 113861.9609 |
68
- | 1500 | 0.1212 | 76997.6094 | 137390.125 | 31.1030 | 11.5111 | 86.873 | 10.859 | 59688.2266 | 113801.2812 |
69
- | 2000 | 0.1616 | 76807.0 | 136926.4219 | 31.1030 | 11.5266 | 86.756 | 10.845 | 59383.0469 | 113498.0234 |
70
- | 2500 | 0.2020 | 77728.7578 | 137971.9219 | 31.1030 | 11.4993 | 86.962 | 10.87 | 60303.1836 | 113861.9609 |
71
- | 3000 | 0.2424 | 76902.2891 | 137312.7031 | 31.1030 | 11.5345 | 86.696 | 10.837 | 59501.0156 | 113801.2812 |
72
- | 3500 | 0.2828 | 76688.0625 | 136541.2188 | 31.1030 | 11.5192 | 86.812 | 10.851 | 59206.6172 | 113376.9609 |
73
- | 4000 | 0.3232 | 77428.3047 | 137428.7812 | 31.1030 | 11.4884 | 87.044 | 10.881 | 60004.8164 | 113679.8984 |
74
- | 4500 | 0.3636 | 76997.6094 | 137312.7031 | 31.1030 | 11.5143 | 86.849 | 10.856 | 59836.4102 | 113679.8984 |
75
- | 5000 | 0.4040 | 77045.3594 | 137506.2656 | 31.1030 | 11.4953 | 86.992 | 10.874 | 59905.7266 | 113801.2812 |
76
- | 5500 | 0.4444 | 76759.4062 | 136926.4219 | 31.1030 | 11.4915 | 87.021 | 10.878 | 59255.5938 | 113376.9609 |
77
- | 6000 | 0.4848 | 76997.6094 | 137080.8594 | 31.1030 | 11.4826 | 87.088 | 10.886 | 59737.5430 | 113679.8984 |
78
- | 6500 | 0.5253 | 77476.3203 | 137583.7812 | 31.1030 | 11.5001 | 86.956 | 10.869 | 60004.8164 | 113801.2812 |
79
- | 7000 | 0.5657 | 76973.75 | 137158.1406 | 31.1030 | 11.5012 | 86.947 | 10.868 | 59510.8320 | 113740.5234 |
80
- | 7500 | 0.6061 | 76973.75 | 137390.125 | 31.1030 | 11.4869 | 87.055 | 10.882 | 59510.8320 | 113740.5234 |
81
- | 8000 | 0.6465 | 76997.6094 | 137506.2656 | 31.1030 | 11.5042 | 86.925 | 10.866 | 59806.7422 | 113801.2812 |
82
- | 8500 | 0.6869 | 77021.4844 | 137506.2656 | 31.1030 | 11.4716 | 87.172 | 10.896 | 59905.7266 | 113801.2812 |
83
- | 9000 | 0.7273 | 76997.6094 | 137506.2656 | 31.1030 | 11.5365 | 86.681 | 10.835 | 59836.4102 | 113740.5234 |
84
- | 9500 | 0.7677 | 76973.75 | 137158.1406 | 31.1030 | 11.5029 | 86.934 | 10.867 | 59510.8320 | 113740.5234 |
85
- | 10000 | 0.8081 | 76997.6094 | 137312.7031 | 31.1030 | 11.5428 | 86.634 | 10.829 | 59717.8320 | 113740.5234 |
86
- | 10500 | 0.8485 | 76997.6094 | 137312.7031 | 31.1030 | 11.5164 | 86.833 | 10.854 | 59737.5430 | 113740.5234 |
87
- | 11000 | 0.8889 | 76997.6094 | 137428.7812 | 31.1030 | 11.5053 | 86.917 | 10.865 | 59806.7422 | 113740.5234 |
88
- | 11500 | 0.9293 | 76997.6094 | 137428.7812 | 31.1030 | 11.5069 | 86.905 | 10.863 | 59806.7422 | 113740.5234 |
89
- | 12000 | 0.9697 | 76997.6094 | 137428.7812 | 31.1030 | 11.5172 | 86.826 | 10.853 | 59806.7422 | 113740.5234 |
90
- | 12375 | 1.0 | 76997.6094 | 137428.7812 | 31.1030 | 11.5045 | 86.923 | 10.865 | 59806.7422 | 113740.5234 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 87049.7578
19
+ - eval_frwikippl: 148519.8594
20
+ - eval_zhwikippl: 112743.5078
21
+ - eval_tinystoriesppl: 68038.7344
22
+ - eval_loss: 32.1160
23
+ - eval_runtime: 11.5146
24
+ - eval_samples_per_second: 86.847
25
+ - eval_steps_per_second: 10.856
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=0, loss_fn=None, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10, loss_fn=kl, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 0.0001
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
 
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 88697.0156 | 150478.2188 | 32.2330 | 11.5103 | 86.878 | 10.86 | 69390.6016 | 113346.8047 |
66
+ | 500 | 0.0404 | 87049.7578 | 148519.8594 | 32.1160 | 11.5316 | 86.718 | 10.84 | 67960.0703 | 112623.2578 |
67
+ | 1000 | 0.0808 | 87049.7578 | 148519.8594 | 32.1180 | 11.498 | 86.971 | 10.871 | 68016.2188 | 112743.5078 |
68
+ | 1500 | 0.1212 | 87049.7578 | 148519.8594 | 32.1180 | 11.5171 | 86.828 | 10.853 | 67993.7812 | 112743.5078 |
69
+ | 2000 | 0.1616 | 87049.7578 | 148519.8594 | 32.1160 | 11.5112 | 86.872 | 10.859 | 68038.7344 | 112743.5078 |
70
+ | 2500 | 0.2020 | 87049.7578 | 148519.8594 | 32.1160 | 11.5174 | 86.825 | 10.853 | 68038.7344 | 112743.5078 |
71
+ | 3000 | 0.2424 | 87049.7578 | 148519.8594 | 32.1160 | 11.5446 | 86.621 | 10.828 | 68016.2188 | 112743.5078 |
72
+ | 3500 | 0.2828 | 87049.7578 | 148519.8594 | 32.1160 | 11.5015 | 86.945 | 10.868 | 68038.7344 | 112743.5078 |
73
+ | 4000 | 0.3232 | 87049.7578 | 148519.8594 | 32.1160 | 11.5349 | 86.693 | 10.837 | 68038.7344 | 112743.5078 |
74
+ | 4500 | 0.3636 | 87049.7578 | 148519.8594 | 32.1160 | 11.5299 | 86.731 | 10.841 | 68038.7344 | 112743.5078 |
75
+ | 5000 | 0.4040 | 87049.7578 | 148519.8594 | 32.1160 | 11.5259 | 86.761 | 10.845 | 68038.7344 | 112743.5078 |
76
+ | 5500 | 0.4444 | 87049.7578 | 148519.8594 | 32.1160 | 11.5002 | 86.955 | 10.869 | 68038.7344 | 112743.5078 |
77
+ | 6000 | 0.4848 | 87049.7578 | 148603.5938 | 32.1160 | 11.5135 | 86.855 | 10.857 | 68061.25 | 112743.5078 |
78
+ | 6500 | 0.5253 | 87049.7578 | 148603.5938 | 32.1160 | 11.5069 | 86.904 | 10.863 | 68061.25 | 112743.5078 |
79
+ | 7000 | 0.5657 | 87049.7578 | 148603.5938 | 32.1160 | 11.509 | 86.889 | 10.861 | 68061.25 | 112743.5078 |
80
+ | 7500 | 0.6061 | 87049.7578 | 148603.5938 | 32.1160 | 11.508 | 86.896 | 10.862 | 68061.25 | 112743.5078 |
81
+ | 8000 | 0.6465 | 87049.7578 | 148603.5938 | 32.1160 | 11.5151 | 86.843 | 10.855 | 68038.7344 | 112743.5078 |
82
+ | 8500 | 0.6869 | 87049.7578 | 148519.8594 | 32.1160 | 11.4916 | 87.02 | 10.878 | 68038.7344 | 112743.5078 |
83
+ | 9000 | 0.7273 | 87049.7578 | 148519.8594 | 32.1160 | 11.5189 | 86.814 | 10.852 | 68038.7344 | 112743.5078 |
84
+ | 9500 | 0.7677 | 87049.7578 | 148519.8594 | 32.1160 | 11.5146 | 86.847 | 10.856 | 68038.7344 | 112743.5078 |
85
+ | 10000 | 0.8081 | 87049.7578 | 148519.8594 | 32.1160 | 11.5098 | 86.883 | 10.86 | 68038.7344 | 112743.5078 |
86
+ | 10500 | 0.8485 | 87049.7578 | 148519.8594 | 32.1160 | 11.5054 | 86.916 | 10.865 | 68038.7344 | 112743.5078 |
87
+ | 11000 | 0.8889 | 87049.7578 | 148519.8594 | 32.1160 | 11.5094 | 86.885 | 10.861 | 68038.7344 | 112743.5078 |
88
+ | 11500 | 0.9293 | 87049.7578 | 148519.8594 | 32.1160 | 11.5376 | 86.673 | 10.834 | 68038.7344 | 112743.5078 |
89
+ | 12000 | 0.9697 | 87049.7578 | 148519.8594 | 32.1160 | 11.494 | 87.002 | 10.875 | 68038.7344 | 112743.5078 |
90
+ | 12375 | 1.0 | 87049.7578 | 148519.8594 | 32.1160 | 11.4926 | 87.013 | 10.877 | 68038.7344 | 112743.5078 |
91
 
92
  ### Framework versions
93
  - Distily 0.2.0
logs/hs_loss_fn=kl, hs_weight=10, learning_rate=0.0001/events.out.tfevents.1723878014.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d9cefb4ed27bd5cdbb89802b1dd471965279c6adf682ebcf580c81dfa87f617
3
+ size 307