lapp0 commited on
Commit
217be1f
1 Parent(s): 3c6e8b4

Training in progress, step 61875

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 141.7497
19
- - eval_frwikippl: 27160.7188
20
- - eval_zhwikippl: 182390.6094
21
- - eval_tinystoriesppl: 11.3304
22
- - eval_loss: 7.2221
23
- - eval_runtime: 65.8262
24
- - eval_samples_per_second: 75.958
25
- - eval_steps_per_second: 9.495
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,47 +47,46 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 0.004
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
56
- - lr_scheduler_warmup_ratio: 0.1
57
  - num_epochs: 1.0
58
 
59
  ### Resource Usage
60
- Peak GPU Memory: 8.2677 GB
61
 
62
  ### Eval-Phase Metrics
63
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
64
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
65
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
66
- | 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 65.8981 | 75.875 | 9.484 | 12321.8145 | 60955.8008 |
67
- | 3000 | 0.0485 | 142.1731 | 27314.1816 | 7.2225 | 65.7757 | 76.016 | 9.502 | 11.3529 | 185037.1875 |
68
- | 6000 | 0.0970 | 137.4355 | 27373.8730 | 7.2213 | 65.7399 | 76.057 | 9.507 | 10.6638 | 187922.7812 |
69
- | 9000 | 0.1455 | 141.7497 | 27160.7188 | 7.2221 | 65.8262 | 75.958 | 9.495 | 11.3304 | 182390.6094 |
70
- | 12000 | 0.1939 | 137.9795 | 27281.5098 | 7.2216 | 65.8535 | 75.926 | 9.491 | 10.8366 | 186325.2656 |
71
- | 15000 | 0.2424 | 137.0103 | 27513.0293 | 7.2218 | 65.7764 | 76.015 | 9.502 | 10.6413 | 189332.0312 |
72
- | 18000 | 0.2909 | 142.2282 | 27191.3516 | 7.2224 | 65.9523 | 75.812 | 9.477 | 11.3351 | 182877.7812 |
73
- | 21000 | 0.3394 | 135.1864 | 27656.7969 | 7.2224 | 65.9704 | 75.792 | 9.474 | 10.4478 | 187422.1875 |
74
- | 24000 | 0.3879 | 142.1621 | 27468.5117 | 7.2223 | 65.8641 | 75.914 | 9.489 | 11.3332 | 180117.75 |
75
- | 27000 | 0.4364 | 134.9144 | 27293.0391 | 7.2231 | 65.8626 | 75.916 | 9.489 | 10.4872 | 182050.1875 |
76
- | 30000 | 0.4848 | 140.8412 | 27042.3691 | 7.2219 | 65.826 | 75.958 | 9.495 | 11.2056 | 181468.2812 |
77
- | 33000 | 0.5333 | 136.8459 | 27680.2012 | 7.2217 | 65.8058 | 75.981 | 9.498 | 10.6224 | 187022.5938 |
78
- | 36000 | 0.5818 | 136.5546 | 26858.2676 | 7.2218 | 65.7491 | 76.047 | 9.506 | 10.7000 | 182244.5625 |
79
- | 39000 | 0.6303 | 135.1864 | 27323.7949 | 7.2218 | 65.914 | 75.856 | 9.482 | 10.4642 | 185185.4844 |
80
- | 42000 | 0.6788 | 141.3933 | 26540.4746 | 7.2219 | 65.7482 | 76.048 | 9.506 | 11.3379 | 179925.625 |
81
- | 45000 | 0.7273 | 142.8466 | 28055.0605 | 7.2226 | 65.741 | 76.056 | 9.507 | 11.2944 | 186922.7344 |
82
- | 48000 | 0.7758 | 136.5335 | 27478.1797 | 7.2224 | 65.8834 | 75.892 | 9.486 | 10.5581 | 186225.9531 |
83
- | 51000 | 0.8242 | 142.2612 | 27429.8477 | 7.2221 | 65.8847 | 75.89 | 9.486 | 11.3215 | 182877.7812 |
84
- | 54000 | 0.8727 | 137.5739 | 27848.3633 | 7.2220 | 66.0422 | 75.709 | 9.464 | 10.6466 | 187622.2969 |
85
- | 57000 | 0.9212 | 141.6180 | 27561.5352 | 7.2221 | 65.9877 | 75.772 | 9.471 | 11.1880 | 191772.1875 |
86
- | 60000 | 0.9697 | 141.9915 | 27429.8477 | 7.2230 | 65.9205 | 75.849 | 9.481 | 11.3163 | 182585.3594 |
87
- | 61875 | 1.0 | 138.3541 | 27281.5098 | 7.2215 | 65.6926 | 76.112 | 9.514 | 10.8914 | 184839.8281 |
88
 
89
  ### Framework versions
90
  - Distily 0.2.0
91
  - Transformers 4.44.0
92
  - Pytorch 2.3.0
93
- - Datasets 2.20.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 3694.4192
19
+ - eval_frwikippl: 30929.5703
20
+ - eval_zhwikippl: 45501.2617
21
+ - eval_tinystoriesppl: 1160.7031
22
+ - eval_loss: 15.6337
23
+ - eval_runtime: 66.5963
24
+ - eval_samples_per_second: 75.079
25
+ - eval_steps_per_second: 9.385
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 0.0004
51
  - train_batch_size: 8
52
  - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: constant
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 8.2666 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 19278.7617 | 60268.5703 | 17.3716 | 66.6062 | 75.068 | 9.384 | 9660.0908 | 53858.2383 |
66
+ | 3000 | 0.0485 | 3702.4434 | 30929.5703 | 15.6332 | 66.4884 | 75.201 | 9.4 | 1163.3928 | 45525.5703 |
67
+ | 6000 | 0.0970 | 3702.4434 | 30929.5703 | 15.6346 | 67.0021 | 74.625 | 9.328 | 1163.0084 | 45525.5703 |
68
+ | 9000 | 0.1455 | 3694.4192 | 30929.5703 | 15.6337 | 66.5963 | 75.079 | 9.385 | 1160.7031 | 45501.2617 |
69
+ | 12000 | 0.1939 | 3696.7100 | 30981.8828 | 15.6348 | 66.5218 | 75.163 | 9.395 | 1161.0868 | 45525.5703 |
70
+ | 15000 | 0.2424 | 3697.8560 | 30929.5703 | 15.6342 | 66.6948 | 74.968 | 9.371 | 1161.8550 | 45525.5703 |
71
+ | 18000 | 0.2909 | 3696.7100 | 30981.8828 | 15.6348 | 66.2258 | 75.499 | 9.437 | 1161.2789 | 45525.5703 |
72
+ | 21000 | 0.3394 | 3697.8560 | 30946.9785 | 15.6344 | 66.5835 | 75.094 | 9.387 | 1161.4711 | 45525.5703 |
73
+ | 24000 | 0.3879 | 3697.8560 | 30929.5703 | 15.6334 | 66.8279 | 74.819 | 9.352 | 1162.0472 | 45525.5703 |
74
+ | 27000 | 0.4364 | 3697.8560 | 30981.8828 | 15.6346 | 66.5691 | 75.11 | 9.389 | 1161.6627 | 45525.5703 |
75
+ | 30000 | 0.4848 | 3696.7100 | 30946.9785 | 15.6346 | 66.7012 | 74.961 | 9.37 | 1160.7031 | 45525.5703 |
76
+ | 33000 | 0.5333 | 3696.1389 | 30981.8828 | 15.6346 | 66.5211 | 75.164 | 9.396 | 1160.1277 | 45525.5703 |
77
+ | 36000 | 0.5818 | 3700.1489 | 30929.5703 | 15.6331 | 66.5006 | 75.187 | 9.398 | 1162.6237 | 45525.5703 |
78
+ | 39000 | 0.6303 | 3694.4192 | 30964.4258 | 15.6344 | 66.3802 | 75.324 | 9.415 | 1160.5111 | 45501.2617 |
79
+ | 42000 | 0.6788 | 3696.7100 | 30946.9785 | 15.6346 | 66.6702 | 74.996 | 9.375 | 1160.7031 | 45525.5703 |
80
+ | 45000 | 0.7273 | 3696.7100 | 30981.8828 | 15.6347 | 66.7768 | 74.876 | 9.36 | 1161.0868 | 45525.5703 |
81
+ | 48000 | 0.7758 | 3694.4192 | 30929.5703 | 15.6331 | 66.6573 | 75.011 | 9.376 | 1160.7031 | 45525.5703 |
82
+ | 51000 | 0.8242 | 3692.7039 | 30981.8828 | 15.6344 | 66.8297 | 74.817 | 9.352 | 1159.7439 | 45501.2617 |
83
+ | 54000 | 0.8727 | 3692.1333 | 30946.9785 | 15.6344 | 66.8788 | 74.762 | 9.345 | 1158.7859 | 45501.2617 |
84
+ | 57000 | 0.9212 | 3696.7100 | 30946.9785 | 15.6346 | 66.8322 | 74.814 | 9.352 | 1160.7031 | 45501.2617 |
85
+ | 60000 | 0.9697 | 3707.0330 | 30929.5703 | 15.6328 | 66.9377 | 74.696 | 9.337 | 1165.3177 | 45525.5703 |
86
+ | 61875 | 1.0 | 3702.4434 | 30929.5703 | 15.6331 | 66.5826 | 75.095 | 9.387 | 1163.3928 | 45501.2617 |
87
 
88
  ### Framework versions
89
  - Distily 0.2.0
90
  - Transformers 4.44.0
91
  - Pytorch 2.3.0
92
+ - Datasets 2.21.0
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.0004, warmup_ratio=0.1/events.out.tfevents.1723776979.b7d545513dcf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a077b832d5b0d72ecb23d8e129566828a7e91f320123130240d7a661b1d13caf
3
+ size 16923358
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.0004, warmup_ratio=0/events.out.tfevents.1723776757.b7d545513dcf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:097e6d52c1b4dd5d6eb0fc4c0858d22c2bdc5f3d78102c9def7dd7f526676cd7
3
- size 312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9aa0d1518347d12c132115d5028df58de4cf8f2c71503b0f5dcb1f638dfebd9
3
+ size 588
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e02c3ac22ff0d23f01f139c12fab9df94791fcaf3e24d1dc8340b3319a0d1408
3
  size 137033984
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63548a6cc02bc7282675ac8003c55707aad063768454af1051d91b299d517c87
3
  size 137033984
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:50afd6a3d26dc6ab323236a90fbecceb23cb8eb9fda0e3a0c7d87b83e42d077d
3
  size 1017948104
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad4d38544df7551ce00e936ef8f18583e06cde197b6b3130d03cbdef91d94057
3
  size 1017948104