Training in progress, step 61875
Browse files- README.md +32 -32
- logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=mse, hs_weight=10.0, learning_rate=0.004, warmup_ratio=0/events.out.tfevents.1723786733.93d6cbb3ad53 +3 -0
- logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.004, warmup_ratio=0.1/events.out.tfevents.1723786534.93d6cbb3ad53 +2 -2
- model.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
-
- eval_enwikippl:
|
19 |
-
- eval_frwikippl:
|
20 |
-
- eval_zhwikippl:
|
21 |
-
- eval_tinystoriesppl:
|
22 |
-
- eval_loss:
|
23 |
-
- eval_runtime: 65.
|
24 |
-
- eval_samples_per_second:
|
25 |
-
- eval_steps_per_second: 9.
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
@@ -47,7 +47,7 @@ More information needed
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
-
- learning_rate: 0.
|
51 |
- train_batch_size: 8
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
@@ -63,31 +63,31 @@ Peak GPU Memory: 8.2677 GB
|
|
63 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
64 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
65 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
66 |
-
| 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 65.
|
67 |
-
| 3000 | 0.0485 |
|
68 |
-
| 6000 | 0.0970 |
|
69 |
-
| 9000 | 0.1455 |
|
70 |
-
| 12000 | 0.1939 |
|
71 |
-
| 15000 | 0.2424 |
|
72 |
-
| 18000 | 0.2909 |
|
73 |
-
| 21000 | 0.3394 |
|
74 |
-
| 24000 | 0.3879 |
|
75 |
-
| 27000 | 0.4364 |
|
76 |
-
| 30000 | 0.4848 |
|
77 |
-
| 33000 | 0.5333 |
|
78 |
-
| 36000 | 0.5818 |
|
79 |
-
| 39000 | 0.6303 |
|
80 |
-
| 42000 | 0.6788 |
|
81 |
-
| 45000 | 0.7273 |
|
82 |
-
| 48000 | 0.7758 |
|
83 |
-
| 51000 | 0.8242 |
|
84 |
-
| 54000 | 0.8727 |
|
85 |
-
| 57000 | 0.9212 |
|
86 |
-
| 60000 | 0.9697 |
|
87 |
-
| 61875 | 1.0 |
|
88 |
|
89 |
### Framework versions
|
90 |
- Distily 0.2.0
|
91 |
- Transformers 4.44.0
|
92 |
- Pytorch 2.3.0
|
93 |
-
- Datasets 2.
|
|
|
15 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
16 |
|
17 |
It achieves the following results on the evaluation set:
|
18 |
+
- eval_enwikippl: 141.7497
|
19 |
+
- eval_frwikippl: 27160.7188
|
20 |
+
- eval_zhwikippl: 182390.6094
|
21 |
+
- eval_tinystoriesppl: 11.3304
|
22 |
+
- eval_loss: 7.2221
|
23 |
+
- eval_runtime: 65.8262
|
24 |
+
- eval_samples_per_second: 75.958
|
25 |
+
- eval_steps_per_second: 9.495
|
26 |
|
27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
28 |
should probably proofread and complete it, then remove this comment.
|
|
|
47 |
The following hyperparameters were used during training:
|
48 |
- distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
|
49 |
- train_embeddings: True
|
50 |
+
- learning_rate: 0.004
|
51 |
- train_batch_size: 8
|
52 |
- eval_batch_size: 8
|
53 |
- seed: 42
|
|
|
63 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
|
64 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
65 |
| **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
|
66 |
+
| 0 | 0 | 21397.4785 | 57946.0117 | 18.3162 | 65.8981 | 75.875 | 9.484 | 12321.8145 | 60955.8008 |
|
67 |
+
| 3000 | 0.0485 | 142.1731 | 27314.1816 | 7.2225 | 65.7757 | 76.016 | 9.502 | 11.3529 | 185037.1875 |
|
68 |
+
| 6000 | 0.0970 | 137.4355 | 27373.8730 | 7.2213 | 65.7399 | 76.057 | 9.507 | 10.6638 | 187922.7812 |
|
69 |
+
| 9000 | 0.1455 | 141.7497 | 27160.7188 | 7.2221 | 65.8262 | 75.958 | 9.495 | 11.3304 | 182390.6094 |
|
70 |
+
| 12000 | 0.1939 | 137.9795 | 27281.5098 | 7.2216 | 65.8535 | 75.926 | 9.491 | 10.8366 | 186325.2656 |
|
71 |
+
| 15000 | 0.2424 | 137.0103 | 27513.0293 | 7.2218 | 65.7764 | 76.015 | 9.502 | 10.6413 | 189332.0312 |
|
72 |
+
| 18000 | 0.2909 | 142.2282 | 27191.3516 | 7.2224 | 65.9523 | 75.812 | 9.477 | 11.3351 | 182877.7812 |
|
73 |
+
| 21000 | 0.3394 | 135.1864 | 27656.7969 | 7.2224 | 65.9704 | 75.792 | 9.474 | 10.4478 | 187422.1875 |
|
74 |
+
| 24000 | 0.3879 | 142.1621 | 27468.5117 | 7.2223 | 65.8641 | 75.914 | 9.489 | 11.3332 | 180117.75 |
|
75 |
+
| 27000 | 0.4364 | 134.9144 | 27293.0391 | 7.2231 | 65.8626 | 75.916 | 9.489 | 10.4872 | 182050.1875 |
|
76 |
+
| 30000 | 0.4848 | 140.8412 | 27042.3691 | 7.2219 | 65.826 | 75.958 | 9.495 | 11.2056 | 181468.2812 |
|
77 |
+
| 33000 | 0.5333 | 136.8459 | 27680.2012 | 7.2217 | 65.8058 | 75.981 | 9.498 | 10.6224 | 187022.5938 |
|
78 |
+
| 36000 | 0.5818 | 136.5546 | 26858.2676 | 7.2218 | 65.7491 | 76.047 | 9.506 | 10.7000 | 182244.5625 |
|
79 |
+
| 39000 | 0.6303 | 135.1864 | 27323.7949 | 7.2218 | 65.914 | 75.856 | 9.482 | 10.4642 | 185185.4844 |
|
80 |
+
| 42000 | 0.6788 | 141.3933 | 26540.4746 | 7.2219 | 65.7482 | 76.048 | 9.506 | 11.3379 | 179925.625 |
|
81 |
+
| 45000 | 0.7273 | 142.8466 | 28055.0605 | 7.2226 | 65.741 | 76.056 | 9.507 | 11.2944 | 186922.7344 |
|
82 |
+
| 48000 | 0.7758 | 136.5335 | 27478.1797 | 7.2224 | 65.8834 | 75.892 | 9.486 | 10.5581 | 186225.9531 |
|
83 |
+
| 51000 | 0.8242 | 142.2612 | 27429.8477 | 7.2221 | 65.8847 | 75.89 | 9.486 | 11.3215 | 182877.7812 |
|
84 |
+
| 54000 | 0.8727 | 137.5739 | 27848.3633 | 7.2220 | 66.0422 | 75.709 | 9.464 | 10.6466 | 187622.2969 |
|
85 |
+
| 57000 | 0.9212 | 141.6180 | 27561.5352 | 7.2221 | 65.9877 | 75.772 | 9.471 | 11.1880 | 191772.1875 |
|
86 |
+
| 60000 | 0.9697 | 141.9915 | 27429.8477 | 7.2230 | 65.9205 | 75.849 | 9.481 | 11.3163 | 182585.3594 |
|
87 |
+
| 61875 | 1.0 | 138.3541 | 27281.5098 | 7.2215 | 65.6926 | 76.112 | 9.514 | 10.8914 | 184839.8281 |
|
88 |
|
89 |
### Framework versions
|
90 |
- Distily 0.2.0
|
91 |
- Transformers 4.44.0
|
92 |
- Pytorch 2.3.0
|
93 |
+
- Datasets 2.20.0
|
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=mse, hs_weight=10.0, learning_rate=0.004, warmup_ratio=0/events.out.tfevents.1723786733.93d6cbb3ad53
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:254cb33f3bd8083cf8618875d03b952fc1f28659dcbbf0199eed50014a6783f3
|
3 |
+
size 16923348
|
logs/attn_loss_fn=mse, attn_weight=10.0, hs_loss_fn=raw_mse, hs_weight=10.0, learning_rate=0.004, warmup_ratio=0.1/events.out.tfevents.1723786534.93d6cbb3ad53
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3b1611898542cb6cdaa388d6affcf9523ee2558dd901a270beecceee4a8da26c
|
3 |
+
size 588
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 137033984
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b2b31538cce0ce4b165827bc5f17f54668fbd3fb115028a3f94e3732aaf22719
|
3 |
size 137033984
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1017948104
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a8e83afba9c435ea9ad8bd8c09e27f039619a31ac399d42958a8e043e1426bd5
|
3 |
size 1017948104
|