End of training

Browse files

Files changed (2) hide show

README.md +22 -22
logs/hs_layer_mapper=last, hs_loss_fn=mse, hs_weight=1.0/events.out.tfevents.1724120424.02dbb11e2dcc +3 -0

README.md CHANGED Viewed

@@ -16,14 +16,14 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
-- eval_enwikippl: 2352.0
-- eval_frwikippl: 10240.0
-- eval_zhwikippl: 109056.0
-- eval_tinystoriesppl: 1920.0
-- eval_loss: 2.6449
-- eval_runtime: 17.0132
-- eval_samples_per_second: 58.778
-- eval_steps_per_second: 7.347
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
@@ -64,20 +64,20 @@ Peak GPU Memory: 8.0892 GB
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
-| 0 | 0 | 1984274890752.0 | 213305255788544.0 | 21.1260 | 17.0018 | 58.817 | 7.352 | 3774873600.0 | 74217034874880.0 |
-| 1000 | 0.0808 | 516.0 | 3952.0 | 1.7644 | 17.0372 | 58.695 | 7.337 | 412.0 | 3760.0 |
-| 2000 | 0.1616 | 516.0 | 3872.0 | 1.7584 | 17.0128 | 58.779 | 7.347 | 462.0 | 860.0 |
-| 3000 | 0.2424 | 864.0 | 4672.0 | 2.0719 | 17.0071 | 58.799 | 7.35 | 788.0 | 2448.0 |
-| 4000 | 0.3232 | 1888.0 | 9344.0 | 2.5277 | 17.1241 | 58.397 | 7.3 | 1696.0 | 26880.0 |
-| 5000 | 0.4040 | 2008.0 | 7712.0 | 2.5758 | 17.0318 | 58.714 | 7.339 | 2256.0 | 48128.0 |
-| 6000 | 0.4848 | 2352.0 | 9984.0 | 2.6397 | 17.1643 | 58.26 | 7.283 | 1856.0 | 54528.0 |
-| 7000 | 0.5657 | 2416.0 | 12096.0 | 2.6472 | 17.0957 | 58.494 | 7.312 | 1880.0 | 109568.0 |
-| 8000 | 0.6465 | 2448.0 | 9856.0 | 2.6570 | 17.0094 | 58.791 | 7.349 | 1960.0 | 115712.0 |
-| 9000 | 0.7273 | 2352.0 | 10240.0 | 2.6449 | 17.0132 | 58.778 | 7.347 | 1920.0 | 109056.0 |
-| 10000 | 0.8081 | 2320.0 | 9344.0 | 2.6556 | 17.0386 | 58.69 | 7.336 | 2096.0 | 87040.0 |
-| 11000 | 0.8889 | 2304.0 | 12224.0 | 2.6333 | 17.0346 | 58.704 | 7.338 | 1888.0 | 130048.0 |
-| 12000 | 0.9697 | 2208.0 | 10368.0 | 2.6107 | 17.0435 | 58.674 | 7.334 | 1808.0 | 98816.0 |
-| 12375 | 1.0 | 2256.0 | 10304.0 | 2.6066 | 17.0663 | 58.595 | 7.324 | 1696.0 | 80896.0 |
 ### Framework versions
 - Distily 0.2.0

 The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
 It achieves the following results on the evaluation set:
+- eval_enwikippl: 1784.0
+- eval_frwikippl: 9792.0
+- eval_zhwikippl: 72192.0
+- eval_tinystoriesppl: 1448.0
+- eval_loss: 2.5122
+- eval_runtime: 17.041
+- eval_samples_per_second: 58.682
+- eval_steps_per_second: 7.335
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment.
 | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | **teacher eval** |  | 43.75 | 61.75 |  |  |  |  | 11.8125 | 19.125 |
+| 0 | 0 | 2336462209024.0 | 122045790683136.0 | 22.4230 | 17.051 | 58.648 | 7.331 | 4429185024.0 | 25975962206208.0 |
+| 1000 | 0.0808 | 588.0 | 3680.0 | 1.8545 | 17.0585 | 58.622 | 7.328 | 612.0 | 2880.0 |
+| 2000 | 0.1616 | 988.0 | 5600.0 | 2.1657 | 17.0179 | 58.762 | 7.345 | 816.0 | 3200.0 |
+| 3000 | 0.2424 | 1744.0 | 8640.0 | 2.5064 | 17.083 | 58.538 | 7.317 | 1544.0 | 46080.0 |
+| 4000 | 0.3232 | 1864.0 | 8896.0 | 2.5506 | 17.0876 | 58.522 | 7.315 | 1544.0 | 63744.0 |
+| 5000 | 0.4040 | 1728.0 | 8832.0 | 2.4970 | 17.0783 | 58.554 | 7.319 | 1520.0 | 51200.0 |
+| 6000 | 0.4848 | 1936.0 | 9216.0 | 2.5779 | 17.0361 | 58.699 | 7.337 | 1688.0 | 63744.0 |
+| 7000 | 0.5657 | 2224.0 | 9792.0 | 2.6441 | 17.0202 | 58.754 | 7.344 | 1832.0 | 82944.0 |
+| 8000 | 0.6465 | 1936.0 | 8832.0 | 2.5707 | 17.057 | 58.627 | 7.328 | 1808.0 | 115200.0 |
+| 9000 | 0.7273 | 1784.0 | 9792.0 | 2.5122 | 17.041 | 58.682 | 7.335 | 1448.0 | 72192.0 |
+| 10000 | 0.8081 | 2064.0 | 9664.0 | 2.5934 | 17.147 | 58.319 | 7.29 | 1552.0 | 91648.0 |
+| 11000 | 0.8889 | 2064.0 | 10240.0 | 2.6004 | 17.0431 | 58.675 | 7.334 | 1720.0 | 80896.0 |
+| 12000 | 0.9697 | 2064.0 | 11008.0 | 2.6036 | 17.1142 | 58.431 | 7.304 | 1800.0 | 68608.0 |
+| 12375 | 1.0 | 1992.0 | 9280.0 | 2.5963 | 17.0677 | 58.59 | 7.324 | 1752.0 | 70144.0 |
 ### Framework versions
 - Distily 0.2.0

logs/hs_layer_mapper=last, hs_loss_fn=mse, hs_weight=1.0/events.out.tfevents.1724120424.02dbb11e2dcc ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17fa436ebc1eb4365904057f5c458c8b6f4cf88a344275c2f2d39b2e7adbee57
+size 307