End of training
Browse files
README.md
CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- eval_enwikippl:
|
19 |
-
- eval_frwikippl:
|
20 |
-
- eval_zhwikippl:
|
21 |
-
- eval_tinystoriesppl: 6.
|
22 |
-
- eval_loss: 0.
|
23 |
-
- eval_runtime: 13.
|
24 |
-
- eval_samples_per_second: 76.
|
25 |
-
- eval_steps_per_second: 9.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -62,27 +62,27 @@ Peak GPU Memory: 6.6064 GB
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
-
| 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.
|
66 |
-
| 5000 | 0.0505 |
|
67 |
-
| 10000 | 0.1010 |
|
68 |
-
| 15000 | 0.1515 | 113.
|
69 |
-
| 20000 | 0.2020 |
|
70 |
-
| 25000 | 0.2525 | 107.
|
71 |
-
| 30000 | 0.3030 | 107.
|
72 |
-
| 35000 | 0.3535 | 107.
|
73 |
-
| 40000 | 0.4040 | 107.
|
74 |
-
| 45000 | 0.4545 | 107.
|
75 |
-
| 50000 | 0.5051 | 107.
|
76 |
-
| 55000 | 0.5556 |
|
77 |
-
| 60000 | 0.6061 |
|
78 |
-
| 65000 | 0.6566 |
|
79 |
-
| 70000 | 0.7071 |
|
80 |
-
| 75000 | 0.7576 | 107.
|
81 |
-
| 80000 | 0.8081 |
|
82 |
-
| 85000 | 0.8586 |
|
83 |
-
| 90000 | 0.9091 |
|
84 |
-
| 95000 | 0.9596 |
|
85 |
-
| 99000 | 1.0 |
|
86 |
|
87 |
### Framework versions
|
88 |
- Distily 0.2.0
|
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 107.6398
|
19 |
+
- eval_frwikippl: 10204.3643
|
20 |
+
- eval_zhwikippl: 49954.8242
|
21 |
+
- eval_tinystoriesppl: 6.6903
|
22 |
+
- eval_loss: 0.7036
|
23 |
+
- eval_runtime: 13.0602
|
24 |
+
- eval_samples_per_second: 76.568
|
25 |
+
- eval_steps_per_second: 9.571
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
62 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
63 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
64 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
65 |
+
| 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0304 | 76.744 | 9.593 | 33932.0586 | 94692.1562 |
|
66 |
+
| 5000 | 0.0505 | 110.8554 | 10584.2598 | 0.7523 | 13.0416 | 76.677 | 9.585 | 6.7911 | 42034.9414 |
|
67 |
+
| 10000 | 0.1010 | 104.0690 | 10210.1172 | 0.7242 | 13.0341 | 76.722 | 9.59 | 6.4174 | 44683.2305 |
|
68 |
+
| 15000 | 0.1515 | 113.6466 | 10400.9941 | 0.7156 | 13.0171 | 76.822 | 9.603 | 7.2840 | 46906.4258 |
|
69 |
+
| 20000 | 0.2020 | 111.4970 | 9877.6748 | 0.7117 | 13.0184 | 76.814 | 9.602 | 7.1889 | 47931.1602 |
|
70 |
+
| 25000 | 0.2525 | 107.3317 | 10121.3330 | 0.7051 | 13.088 | 76.406 | 9.551 | 6.6947 | 49516.9375 |
|
71 |
+
| 30000 | 0.3030 | 107.4814 | 10147.0312 | 0.7042 | 13.0664 | 76.532 | 9.567 | 6.6925 | 49728.7578 |
|
72 |
+
| 35000 | 0.3535 | 107.5147 | 10109.9404 | 0.7041 | 13.0324 | 76.732 | 9.591 | 6.6794 | 49279.6914 |
|
73 |
+
| 40000 | 0.4040 | 107.5064 | 10121.3330 | 0.7041 | 13.1335 | 76.141 | 9.518 | 6.6994 | 49835.0078 |
|
74 |
+
| 45000 | 0.4545 | 107.3816 | 10129.8984 | 0.7039 | 13.1075 | 76.292 | 9.537 | 6.6972 | 49464.1211 |
|
75 |
+
| 50000 | 0.5051 | 107.5231 | 10129.8984 | 0.7040 | 13.0137 | 76.842 | 9.605 | 6.7041 | 49808.4492 |
|
76 |
+
| 55000 | 0.5556 | 107.7482 | 10135.5996 | 0.7040 | 13.0084 | 76.874 | 9.609 | 6.7052 | 49464.1211 |
|
77 |
+
| 60000 | 0.6061 | 107.6064 | 10204.3643 | 0.7040 | 13.0291 | 76.751 | 9.594 | 6.6991 | 49914.8711 |
|
78 |
+
| 65000 | 0.6566 | 107.6981 | 10204.3643 | 0.7037 | 13.0479 | 76.641 | 9.58 | 6.6958 | 49543.3398 |
|
79 |
+
| 70000 | 0.7071 | 107.8484 | 10204.3643 | 0.7036 | 13.0612 | 76.563 | 9.57 | 6.6953 | 49848.3164 |
|
80 |
+
| 75000 | 0.7576 | 107.5897 | 10204.3643 | 0.7036 | 13.1821 | 75.86 | 9.483 | 6.6895 | 49888.2188 |
|
81 |
+
| 80000 | 0.8081 | 107.6398 | 10204.3643 | 0.7037 | 13.1572 | 76.004 | 9.5 | 6.6900 | 49835.0078 |
|
82 |
+
| 85000 | 0.8586 | 107.7148 | 10204.3643 | 0.7037 | 12.9936 | 76.961 | 9.62 | 6.6928 | 49928.1523 |
|
83 |
+
| 90000 | 0.9091 | 107.6398 | 10204.3643 | 0.7035 | 13.0225 | 76.79 | 9.599 | 6.6919 | 49954.8242 |
|
84 |
+
| 95000 | 0.9596 | 107.6398 | 10204.3643 | 0.7036 | 13.0696 | 76.514 | 9.564 | 6.6914 | 49954.8242 |
|
85 |
+
| 99000 | 1.0 | 107.6398 | 10204.3643 | 0.7036 | 13.0602 | 76.568 | 9.571 | 6.6903 | 49954.8242 |
|
86 |
|
87 |
### Framework versions
|
88 |
- Distily 0.2.0
|
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=1e-05, max_grad_norm=100/events.out.tfevents.1724042225.5f530b1cf724
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7d3e78b7c39fd7d70f8e2a5c6a75a86d9fcd88e839677582d19650529b5b1cfa
|
3 |
+
size 312
|