lapp0 commited on
Commit
3c6e8b4
1 Parent(s): 5dce74b

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 149.5442
19
- - eval_frwikippl: 28142.1230
20
- - eval_zhwikippl: 243104.3594
21
- - eval_tinystoriesppl: 11.2706
22
- - eval_loss: 7.4452
23
- - eval_runtime: 66.0052
24
- - eval_samples_per_second: 75.752
25
- - eval_steps_per_second: 9.469
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -53,37 +53,38 @@ The following hyperparameters were used during training:
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 8.2666 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 58069.8203 | 77442.5625 | 18.5372 | 65.9335 | 75.834 | 9.479 | 46072.8867 | 100550.5078 |
66
- | 3000 | 0.0485 | 145.3919 | 28068.8965 | 7.4450 | 66.1007 | 75.642 | 9.455 | 10.8792 | 239563.0469 |
67
- | 6000 | 0.0970 | 143.8738 | 27934.7852 | 7.4443 | 65.9587 | 75.805 | 9.476 | 10.7017 | 239818.8281 |
68
- | 9000 | 0.1455 | 149.5442 | 28142.1230 | 7.4452 | 66.0052 | 75.752 | 9.469 | 11.2706 | 243104.3594 |
69
- | 12000 | 0.1939 | 141.4481 | 28096.5879 | 7.4447 | 66.228 | 75.497 | 9.437 | 10.4616 | 242650.5938 |
70
- | 15000 | 0.2424 | 141.9365 | 27532.4258 | 7.4447 | 66.1198 | 75.62 | 9.453 | 10.5402 | 235884.5 |
71
- | 18000 | 0.2909 | 150.4271 | 28453.0332 | 7.4452 | 65.9158 | 75.854 | 9.482 | 11.3604 | 248415.0781 |
72
- | 21000 | 0.3394 | 148.4649 | 27337.2715 | 7.4450 | 65.7078 | 76.094 | 9.512 | 11.2888 | 229674.3125 |
73
- | 24000 | 0.3879 | 149.7760 | 28039.2520 | 7.4446 | 65.7891 | 76.0 | 9.5 | 11.2827 | 240716.3594 |
74
- | 27000 | 0.4364 | 141.8706 | 28049.1211 | 7.4454 | 65.831 | 75.952 | 9.494 | 10.5280 | 235255.9062 |
75
- | 30000 | 0.4848 | 144.7906 | 28084.7207 | 7.4449 | 65.9422 | 75.824 | 9.478 | 10.7119 | 240716.3594 |
76
- | 33000 | 0.5333 | 149.6832 | 28237.4258 | 7.4454 | 65.807 | 75.98 | 9.497 | 11.2524 | 244013.9531 |
77
- | 36000 | 0.5818 | 148.6030 | 27445.3125 | 7.4453 | 65.6651 | 76.144 | 9.518 | 11.2729 | 236893.5625 |
78
- | 39000 | 0.6303 | 142.9683 | 27676.2949 | 7.4447 | 65.6729 | 76.135 | 9.517 | 10.6589 | 235381.5781 |
79
- | 42000 | 0.6788 | 146.5510 | 27895.4648 | 7.4449 | 65.6881 | 76.117 | 9.515 | 10.9904 | 239690.7812 |
80
- | 45000 | 0.7273 | 149.2144 | 28023.4531 | 7.4454 | 65.9058 | 75.866 | 9.483 | 11.2701 | 240716.3594 |
81
- | 48000 | 0.7758 | 144.2086 | 28243.4043 | 7.4449 | 65.873 | 75.904 | 9.488 | 10.7022 | 244339.7188 |
82
- | 51000 | 0.8242 | 141.9915 | 27781.7559 | 7.4450 | 65.9256 | 75.843 | 9.48 | 10.5589 | 239563.0469 |
83
- | 54000 | 0.8727 | 145.6399 | 28219.5234 | 7.4451 | 65.6892 | 76.116 | 9.515 | 10.8693 | 238542.6094 |
84
- | 57000 | 0.9212 | 144.2365 | 27040.4609 | 7.4445 | 65.6838 | 76.122 | 9.515 | 10.8312 | 227175.6875 |
85
- | 60000 | 0.9697 | 144.3482 | 26979.5938 | 7.4447 | 65.623 | 76.193 | 9.524 | 10.9257 | 232138.6562 |
86
- | 61875 | 1.0 | 146.9147 | 28084.7207 | 7.4450 | 65.6082 | 76.21 | 9.526 | 10.9569 | 237146.5 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 141.7497
19
+ - eval_frwikippl: 27160.7188
20
+ - eval_zhwikippl: 182390.6094
21
+ - eval_tinystoriesppl: 11.3304
22
+ - eval_loss: 7.2221
23
+ - eval_runtime: 65.8262
24
+ - eval_samples_per_second: 75.958
25
+ - eval_steps_per_second: 9.495
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
56
+ - lr_scheduler_warmup_ratio: 0.1
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
+ Peak GPU Memory: 8.2677 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
+ | 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 65.8981 | 75.875 | 9.484 | 12321.8145 | 60955.8008 |
67
+ | 3000 | 0.0485 | 142.1731 | 27314.1816 | 7.2225 | 65.7757 | 76.016 | 9.502 | 11.3529 | 185037.1875 |
68
+ | 6000 | 0.0970 | 137.4355 | 27373.8730 | 7.2213 | 65.7399 | 76.057 | 9.507 | 10.6638 | 187922.7812 |
69
+ | 9000 | 0.1455 | 141.7497 | 27160.7188 | 7.2221 | 65.8262 | 75.958 | 9.495 | 11.3304 | 182390.6094 |
70
+ | 12000 | 0.1939 | 137.9795 | 27281.5098 | 7.2216 | 65.8535 | 75.926 | 9.491 | 10.8366 | 186325.2656 |
71
+ | 15000 | 0.2424 | 137.0103 | 27513.0293 | 7.2218 | 65.7764 | 76.015 | 9.502 | 10.6413 | 189332.0312 |
72
+ | 18000 | 0.2909 | 142.2282 | 27191.3516 | 7.2224 | 65.9523 | 75.812 | 9.477 | 11.3351 | 182877.7812 |
73
+ | 21000 | 0.3394 | 135.1864 | 27656.7969 | 7.2224 | 65.9704 | 75.792 | 9.474 | 10.4478 | 187422.1875 |
74
+ | 24000 | 0.3879 | 142.1621 | 27468.5117 | 7.2223 | 65.8641 | 75.914 | 9.489 | 11.3332 | 180117.75 |
75
+ | 27000 | 0.4364 | 134.9144 | 27293.0391 | 7.2231 | 65.8626 | 75.916 | 9.489 | 10.4872 | 182050.1875 |
76
+ | 30000 | 0.4848 | 140.8412 | 27042.3691 | 7.2219 | 65.826 | 75.958 | 9.495 | 11.2056 | 181468.2812 |
77
+ | 33000 | 0.5333 | 136.8459 | 27680.2012 | 7.2217 | 65.8058 | 75.981 | 9.498 | 10.6224 | 187022.5938 |
78
+ | 36000 | 0.5818 | 136.5546 | 26858.2676 | 7.2218 | 65.7491 | 76.047 | 9.506 | 10.7000 | 182244.5625 |
79
+ | 39000 | 0.6303 | 135.1864 | 27323.7949 | 7.2218 | 65.914 | 75.856 | 9.482 | 10.4642 | 185185.4844 |
80
+ | 42000 | 0.6788 | 141.3933 | 26540.4746 | 7.2219 | 65.7482 | 76.048 | 9.506 | 11.3379 | 179925.625 |
81
+ | 45000 | 0.7273 | 142.8466 | 28055.0605 | 7.2226 | 65.741 | 76.056 | 9.507 | 11.2944 | 186922.7344 |
82
+ | 48000 | 0.7758 | 136.5335 | 27478.1797 | 7.2224 | 65.8834 | 75.892 | 9.486 | 10.5581 | 186225.9531 |
83
+ | 51000 | 0.8242 | 142.2612 | 27429.8477 | 7.2221 | 65.8847 | 75.89 | 9.486 | 11.3215 | 182877.7812 |
84
+ | 54000 | 0.8727 | 137.5739 | 27848.3633 | 7.2220 | 66.0422 | 75.709 | 9.464 | 10.6466 | 187622.2969 |
85
+ | 57000 | 0.9212 | 141.6180 | 27561.5352 | 7.2221 | 65.9877 | 75.772 | 9.471 | 11.1880 | 191772.1875 |
86
+ | 60000 | 0.9697 | 141.9915 | 27429.8477 | 7.2230 | 65.9205 | 75.849 | 9.481 | 11.3163 | 182585.3594 |
87
+ | 61875 | 1.0 | 138.3541 | 27281.5098 | 7.2215 | 65.6926 | 76.112 | 9.514 | 10.8914 | 184839.8281 |
88
 
89
  ### Framework versions
90
  - Distily 0.2.0
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.004, warmup_ratio=0.1/events.out.tfevents.1723786534.93d6cbb3ad53 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97d32b320522924c8dcc6c869382a81362fd3ab34ffabdebcc19154f1492a2a4
3
+ size 312