lapp0 commited on
Commit
362e4c0
·
verified ·
1 Parent(s): 4eed482

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 198.0270
19
- - eval_frwikippl: 17127.3379
20
- - eval_zhwikippl: 63614.1797
21
- - eval_tinystoriesppl: 21.7514
22
- - eval_loss: 12.7123
23
- - eval_runtime: 65.1143
24
- - eval_samples_per_second: 76.788
25
- - eval_steps_per_second: 9.599
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -53,37 +53,38 @@ The following hyperparameters were used during training:
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 8.2666 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 14504.7236 | 73076.2578 | 17.7824 | 65.8473 | 75.933 | 9.492 | 6091.4858 | 69506.0391 |
66
- | 3000 | 0.0485 | 198.0270 | 17127.3379 | 12.7127 | 65.1257 | 76.775 | 9.597 | 21.7514 | 63648.1641 |
67
- | 6000 | 0.0970 | 197.0935 | 17088.7852 | 12.7124 | 65.0473 | 76.867 | 9.608 | 21.6429 | 63750.0977 |
68
- | 9000 | 0.1455 | 198.0270 | 17127.3379 | 12.7123 | 65.1143 | 76.788 | 9.599 | 21.7514 | 63614.1797 |
69
- | 12000 | 0.1939 | 197.4985 | 17098.4199 | 12.7128 | 65.0689 | 76.842 | 9.605 | 21.7173 | 63886.3086 |
70
- | 15000 | 0.2424 | 197.3838 | 17122.5215 | 12.7128 | 65.1155 | 76.787 | 9.598 | 21.6850 | 63954.5195 |
71
- | 18000 | 0.2909 | 197.9733 | 17108.0586 | 12.7130 | 65.0798 | 76.829 | 9.604 | 21.7865 | 63818.1680 |
72
- | 21000 | 0.3394 | 197.1698 | 17103.2305 | 12.7134 | 65.0799 | 76.829 | 9.604 | 21.6322 | 64022.8672 |
73
- | 24000 | 0.3879 | 197.8507 | 17117.7051 | 12.7131 | 65.1719 | 76.72 | 9.59 | 21.7757 | 63716.1211 |
74
- | 27000 | 0.4364 | 197.6975 | 17127.3379 | 12.7131 | 65.1124 | 76.79 | 9.599 | 21.7191 | 63954.5195 |
75
- | 30000 | 0.4848 | 197.3150 | 17079.1562 | 12.7131 | 65.1085 | 76.795 | 9.599 | 21.6904 | 63920.4375 |
76
- | 33000 | 0.5333 | 197.6209 | 17103.2305 | 12.7129 | 65.2897 | 76.582 | 9.573 | 21.7191 | 63750.0977 |
77
- | 36000 | 0.5818 | 198.3110 | 17127.3379 | 12.7122 | 65.5537 | 76.273 | 9.534 | 21.7883 | 63614.1797 |
78
- | 39000 | 0.6303 | 198.2802 | 17127.3379 | 12.7128 | 65.2001 | 76.687 | 9.586 | 21.7721 | 63648.1641 |
79
- | 42000 | 0.6788 | 197.9580 | 17127.3379 | 12.7130 | 65.4586 | 76.384 | 9.548 | 21.7433 | 63512.4648 |
80
- | 45000 | 0.7273 | 198.2802 | 17108.0586 | 12.7129 | 65.2819 | 76.591 | 9.574 | 21.8009 | 63614.1797 |
81
- | 48000 | 0.7758 | 197.5979 | 17098.4199 | 12.7125 | 65.1997 | 76.688 | 9.586 | 21.6940 | 63648.1641 |
82
- | 51000 | 0.8242 | 198.2802 | 17127.3379 | 12.7120 | 65.5503 | 76.277 | 9.535 | 21.7892 | 63512.4648 |
83
- | 54000 | 0.8727 | 198.2189 | 17127.3379 | 12.7129 | 65.3863 | 76.469 | 9.559 | 21.7811 | 63716.1211 |
84
- | 57000 | 0.9212 | 199.1886 | 17136.9941 | 12.7133 | 65.1649 | 76.728 | 9.591 | 21.8759 | 63343.2148 |
85
- | 60000 | 0.9697 | 197.3226 | 17122.5215 | 12.7127 | 65.2079 | 76.678 | 9.585 | 21.6886 | 63648.1641 |
86
- | 61875 | 1.0 | 198.9419 | 17127.3379 | 12.7117 | 65.2766 | 76.597 | 9.575 | 21.8469 | 63648.1641 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 207.7861
19
+ - eval_frwikippl: 15066.8408
20
+ - eval_zhwikippl: 64727.0352
21
+ - eval_tinystoriesppl: 24.1522
22
+ - eval_loss: 13.2019
23
+ - eval_runtime: 65.4151
24
+ - eval_samples_per_second: 76.435
25
+ - eval_steps_per_second: 9.554
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
56
+ - lr_scheduler_warmup_ratio: 0.1
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
+ Peak GPU Memory: 8.2677 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
+ | 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 65.6143 | 76.203 | 9.525 | 12321.8145 | 60955.8008 |
67
+ | 3000 | 0.0485 | 207.9149 | 15083.8350 | 13.2031 | 65.3099 | 76.558 | 9.57 | 24.1822 | 63920.4375 |
68
+ | 6000 | 0.0970 | 207.6253 | 15109.3467 | 13.2019 | 65.2976 | 76.572 | 9.572 | 24.1253 | 65386.5469 |
69
+ | 9000 | 0.1455 | 207.7861 | 15066.8408 | 13.2019 | 65.4151 | 76.435 | 9.554 | 24.1522 | 64727.0352 |
70
+ | 12000 | 0.1939 | 207.4002 | 15100.8330 | 13.2016 | 65.3491 | 76.512 | 9.564 | 24.0894 | 65229.7188 |
71
+ | 15000 | 0.2424 | 207.8023 | 15100.8330 | 13.2017 | 65.4255 | 76.423 | 9.553 | 24.1442 | 65247.1406 |
72
+ | 18000 | 0.2909 | 208.3987 | 15075.3359 | 13.2031 | 65.4213 | 76.428 | 9.553 | 24.2462 | 64057.0078 |
73
+ | 21000 | 0.3394 | 208.0761 | 15100.8330 | 13.2026 | 65.2706 | 76.604 | 9.576 | 24.2142 | 64537.3164 |
74
+ | 24000 | 0.3879 | 207.9955 | 15100.8330 | 13.2027 | 65.2287 | 76.653 | 9.582 | 24.1822 | 64159.6602 |
75
+ | 27000 | 0.4364 | 208.3180 | 15058.3516 | 13.2033 | 65.1653 | 76.728 | 9.591 | 24.2272 | 63869.3125 |
76
+ | 30000 | 0.4848 | 207.1754 | 15100.8330 | 13.2016 | 65.1169 | 76.785 | 9.598 | 24.0546 | 65229.7188 |
77
+ | 33000 | 0.5333 | 208.0761 | 15083.8350 | 13.2026 | 65.2105 | 76.675 | 9.584 | 24.2412 | 64588.9727 |
78
+ | 36000 | 0.5818 | 207.1754 | 15066.8408 | 13.2023 | 65.3453 | 76.517 | 9.565 | 24.0715 | 65229.7188 |
79
+ | 39000 | 0.6303 | 207.1754 | 15100.8330 | 13.2017 | 65.2569 | 76.62 | 9.578 | 24.0695 | 65229.7188 |
80
+ | 42000 | 0.6788 | 207.3681 | 15058.3516 | 13.2021 | 65.2167 | 76.668 | 9.583 | 24.0954 | 64796.1484 |
81
+ | 45000 | 0.7273 | 207.9955 | 15100.8330 | 13.2026 | 65.2551 | 76.622 | 9.578 | 24.1982 | 64159.6602 |
82
+ | 48000 | 0.7758 | 207.7861 | 15092.3242 | 13.2017 | 65.3187 | 76.548 | 9.568 | 24.1412 | 64727.0352 |
83
+ | 51000 | 0.8242 | 208.2050 | 15058.3516 | 13.2029 | 65.2525 | 76.625 | 9.578 | 24.2262 | 64193.8711 |
84
+ | 54000 | 0.8727 | 207.7861 | 15100.8330 | 13.2027 | 65.2798 | 76.593 | 9.574 | 24.1362 | 64331.0312 |
85
+ | 57000 | 0.9212 | 207.7218 | 15100.8330 | 13.2017 | 65.2646 | 76.611 | 9.576 | 24.1163 | 65125.4180 |
86
+ | 60000 | 0.9697 | 208.5925 | 15092.3242 | 13.2034 | 65.2233 | 76.66 | 9.582 | 24.2653 | 63869.3125 |
87
+ | 61875 | 1.0 | 208.0116 | 15100.8330 | 13.2018 | 65.2936 | 76.577 | 9.572 | 24.2012 | 64917.2539 |
88
 
89
  ### Framework versions
90
  - Distily 0.2.0
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.001, warmup_ratio=0.1/events.out.tfevents.1723793623.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3399554b3290cdc2f6e9aefc5517c45c6373a47fc48eb89da7e3857bb74515ed
3
+ size 312