lapp0 commited on
Commit
926ac2d
1 Parent(s): 993efea

End of training

Browse files
README.md CHANGED
@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 2352.0
20
- - eval_frwikippl: 10240.0
21
- - eval_zhwikippl: 109056.0
22
- - eval_tinystoriesppl: 1920.0
23
- - eval_loss: 2.6449
24
- - eval_runtime: 17.0132
25
- - eval_samples_per_second: 58.778
26
- - eval_steps_per_second: 7.347
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
@@ -64,20 +64,20 @@ Peak GPU Memory: 8.0892 GB
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
67
- | 0 | 0 | 1984274890752.0 | 213305255788544.0 | 21.1260 | 17.0018 | 58.817 | 7.352 | 3774873600.0 | 74217034874880.0 |
68
- | 1000 | 0.0808 | 516.0 | 3952.0 | 1.7644 | 17.0372 | 58.695 | 7.337 | 412.0 | 3760.0 |
69
- | 2000 | 0.1616 | 516.0 | 3872.0 | 1.7584 | 17.0128 | 58.779 | 7.347 | 462.0 | 860.0 |
70
- | 3000 | 0.2424 | 864.0 | 4672.0 | 2.0719 | 17.0071 | 58.799 | 7.35 | 788.0 | 2448.0 |
71
- | 4000 | 0.3232 | 1888.0 | 9344.0 | 2.5277 | 17.1241 | 58.397 | 7.3 | 1696.0 | 26880.0 |
72
- | 5000 | 0.4040 | 2008.0 | 7712.0 | 2.5758 | 17.0318 | 58.714 | 7.339 | 2256.0 | 48128.0 |
73
- | 6000 | 0.4848 | 2352.0 | 9984.0 | 2.6397 | 17.1643 | 58.26 | 7.283 | 1856.0 | 54528.0 |
74
- | 7000 | 0.5657 | 2416.0 | 12096.0 | 2.6472 | 17.0957 | 58.494 | 7.312 | 1880.0 | 109568.0 |
75
- | 8000 | 0.6465 | 2448.0 | 9856.0 | 2.6570 | 17.0094 | 58.791 | 7.349 | 1960.0 | 115712.0 |
76
- | 9000 | 0.7273 | 2352.0 | 10240.0 | 2.6449 | 17.0132 | 58.778 | 7.347 | 1920.0 | 109056.0 |
77
- | 10000 | 0.8081 | 2320.0 | 9344.0 | 2.6556 | 17.0386 | 58.69 | 7.336 | 2096.0 | 87040.0 |
78
- | 11000 | 0.8889 | 2304.0 | 12224.0 | 2.6333 | 17.0346 | 58.704 | 7.338 | 1888.0 | 130048.0 |
79
- | 12000 | 0.9697 | 2208.0 | 10368.0 | 2.6107 | 17.0435 | 58.674 | 7.334 | 1808.0 | 98816.0 |
80
- | 12375 | 1.0 | 2256.0 | 10304.0 | 2.6066 | 17.0663 | 58.595 | 7.324 | 1696.0 | 80896.0 |
81
 
82
  ### Framework versions
83
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 1784.0
20
+ - eval_frwikippl: 9792.0
21
+ - eval_zhwikippl: 72192.0
22
+ - eval_tinystoriesppl: 1448.0
23
+ - eval_loss: 2.5122
24
+ - eval_runtime: 17.041
25
+ - eval_samples_per_second: 58.682
26
+ - eval_steps_per_second: 7.335
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
  should probably proofread and complete it, then remove this comment.
 
64
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
  | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
67
+ | 0 | 0 | 2336462209024.0 | 122045790683136.0 | 22.4230 | 17.051 | 58.648 | 7.331 | 4429185024.0 | 25975962206208.0 |
68
+ | 1000 | 0.0808 | 588.0 | 3680.0 | 1.8545 | 17.0585 | 58.622 | 7.328 | 612.0 | 2880.0 |
69
+ | 2000 | 0.1616 | 988.0 | 5600.0 | 2.1657 | 17.0179 | 58.762 | 7.345 | 816.0 | 3200.0 |
70
+ | 3000 | 0.2424 | 1744.0 | 8640.0 | 2.5064 | 17.083 | 58.538 | 7.317 | 1544.0 | 46080.0 |
71
+ | 4000 | 0.3232 | 1864.0 | 8896.0 | 2.5506 | 17.0876 | 58.522 | 7.315 | 1544.0 | 63744.0 |
72
+ | 5000 | 0.4040 | 1728.0 | 8832.0 | 2.4970 | 17.0783 | 58.554 | 7.319 | 1520.0 | 51200.0 |
73
+ | 6000 | 0.4848 | 1936.0 | 9216.0 | 2.5779 | 17.0361 | 58.699 | 7.337 | 1688.0 | 63744.0 |
74
+ | 7000 | 0.5657 | 2224.0 | 9792.0 | 2.6441 | 17.0202 | 58.754 | 7.344 | 1832.0 | 82944.0 |
75
+ | 8000 | 0.6465 | 1936.0 | 8832.0 | 2.5707 | 17.057 | 58.627 | 7.328 | 1808.0 | 115200.0 |
76
+ | 9000 | 0.7273 | 1784.0 | 9792.0 | 2.5122 | 17.041 | 58.682 | 7.335 | 1448.0 | 72192.0 |
77
+ | 10000 | 0.8081 | 2064.0 | 9664.0 | 2.5934 | 17.147 | 58.319 | 7.29 | 1552.0 | 91648.0 |
78
+ | 11000 | 0.8889 | 2064.0 | 10240.0 | 2.6004 | 17.0431 | 58.675 | 7.334 | 1720.0 | 80896.0 |
79
+ | 12000 | 0.9697 | 2064.0 | 11008.0 | 2.6036 | 17.1142 | 58.431 | 7.304 | 1800.0 | 68608.0 |
80
+ | 12375 | 1.0 | 1992.0 | 9280.0 | 2.5963 | 17.0677 | 58.59 | 7.324 | 1752.0 | 70144.0 |
81
 
82
  ### Framework versions
83
  - Distily 0.2.0
logs/hs_layer_mapper=last, hs_loss_fn=mse, hs_weight=1.0/events.out.tfevents.1724120424.02dbb11e2dcc ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17fa436ebc1eb4365904057f5c458c8b6f4cf88a344275c2f2d39b2e7adbee57
3
+ size 307