lapp0 commited on
Commit
237ac9e
1 Parent(s): 5e015bc

End of training

Browse files
README.md CHANGED
@@ -16,11 +16,11 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 539.2350
20
- - eval_frwikippl: 3470.0164
21
- - eval_zhwikippl: 15822.4590
22
- - eval_loss: 3.7526
23
- - eval_runtime: 17.2807
24
  - eval_samples_per_second: 57.868
25
  - eval_steps_per_second: 7.234
26
 
@@ -45,7 +45,7 @@ More information needed
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
- - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=reverse_kl, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
@@ -62,20 +62,20 @@ Peak GPU Memory: 8.0903 GB
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
- | 0 | 0 | 55429.6875 | 57698.8047 | 678.6000 | 17.2753 | 57.886 | 7.236 | 56988.9141 |
66
- | 1000 | 0.0808 | 2093.6777 | 12120.9648 | 7.9200 | 17.2744 | 57.889 | 7.236 | 152438.5625 |
67
- | 2000 | 0.1616 | 1304.2460 | 8493.5225 | 7.5052 | 17.3176 | 57.745 | 7.218 | 58375.3203 |
68
- | 3000 | 0.2424 | 924.3269 | 6221.1147 | 7.2972 | 17.3721 | 57.564 | 7.195 | 42663.6406 |
69
- | 4000 | 0.3232 | 765.8523 | 4975.8003 | 7.1927 | 17.3013 | 57.799 | 7.225 | 31888.1699 |
70
- | 5000 | 0.4040 | 677.1083 | 4361.1675 | 7.1110 | 17.3208 | 57.734 | 7.217 | 29598.4395 |
71
- | 6000 | 0.4848 | 634.8929 | 3934.5681 | 3.8812 | 17.3181 | 57.743 | 7.218 | 20375.8535 |
72
- | 7000 | 0.5657 | 610.9518 | 3706.7395 | 3.8349 | 17.3127 | 57.761 | 7.22 | 22378.3477 |
73
- | 8000 | 0.6465 | 574.6434 | 3612.8188 | 3.7883 | 17.3011 | 57.8 | 7.225 | 20749.2754 |
74
- | 9000 | 0.7273 | 539.2350 | 3470.0164 | 3.7526 | 17.2807 | 57.868 | 7.234 | 15822.4590 |
75
- | 10000 | 0.8081 | 522.2968 | 3161.6411 | 3.7214 | 17.3805 | 57.536 | 7.192 | 8437.9199 |
76
- | 11000 | 0.8889 | 509.9135 | 3373.0425 | 3.6955 | 17.2891 | 57.84 | 7.23 | 6858.4756 |
77
- | 12000 | 0.9697 | 478.6796 | 3359.7512 | 3.6670 | 17.3301 | 57.703 | 7.213 | 7444.5142 |
78
- | 12375 | 1.0 | 482.2987 | 3227.4072 | 3.6571 | 17.2805 | 57.869 | 7.234 | 8891.3779 |
79
 
80
  ### Framework versions
81
  - Distily 0.2.0
 
16
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
18
  It achieves the following results on the evaluation set:
19
+ - eval_enwikippl: 225.9773
20
+ - eval_frwikippl: 1391.1320
21
+ - eval_zhwikippl: 821.2236
22
+ - eval_loss: 19.6630
23
+ - eval_runtime: 17.2806
24
  - eval_samples_per_second: 57.868
25
  - eval_steps_per_second: 7.234
26
 
 
45
  ### Training hyperparameters
46
 
47
  The following hyperparameters were used during training:
48
+ - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=ce, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
  - learning_rate: 4e-05
51
  - train_batch_size: 8
 
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 30.2086 | 57.2728 | | | | | 18.1784 |
65
+ | 0 | 0 | 55429.6875 | 57698.8047 | 24.5150 | 17.2943 | 57.823 | 7.228 | 56988.9141 |
66
+ | 1000 | 0.0808 | 713.7677 | 4453.7666 | 20.3910 | 17.3531 | 57.627 | 7.203 | 17866.8926 |
67
+ | 2000 | 0.1616 | 521.2028 | 3308.0386 | 20.2010 | 17.3798 | 57.538 | 7.192 | 2471.2515 |
68
+ | 3000 | 0.2424 | 433.2541 | 2722.2993 | 20.1000 | 17.3672 | 57.58 | 7.197 | 1283.4985 |
69
+ | 4000 | 0.3232 | 387.5081 | 2569.3728 | 20.0170 | 17.3651 | 57.587 | 7.198 | 1167.0867 |
70
+ | 5000 | 0.4040 | 332.2302 | 2197.1006 | 19.9310 | 17.283 | 57.86 | 7.233 | 1141.8051 |
71
+ | 6000 | 0.4848 | 292.5944 | 1835.8154 | 19.8590 | 17.2939 | 57.824 | 7.228 | 905.3102 |
72
+ | 7000 | 0.5657 | 266.3748 | 1648.5508 | 19.7820 | 17.3184 | 57.742 | 7.218 | 844.8045 |
73
+ | 8000 | 0.6465 | 244.8321 | 1513.9550 | 19.7310 | 17.3028 | 57.794 | 7.224 | 1150.9904 |
74
+ | 9000 | 0.7273 | 225.9773 | 1391.1320 | 19.6630 | 17.2806 | 57.868 | 7.234 | 821.2236 |
75
+ | 10000 | 0.8081 | 209.6788 | 1266.0754 | 19.6040 | 17.3446 | 57.655 | 7.207 | 718.9499 |
76
+ | 11000 | 0.8889 | 196.7588 | 1248.5234 | 19.5620 | 17.3611 | 57.6 | 7.2 | 611.5998 |
77
+ | 12000 | 0.9697 | 179.4194 | 1137.2484 | 19.5120 | 17.3767 | 57.548 | 7.194 | 572.3267 |
78
+ | 12375 | 1.0 | 175.7241 | 1080.9574 | 19.4920 | 17.3076 | 57.778 | 7.222 | 584.9987 |
79
 
80
  ### Framework versions
81
  - Distily 0.2.0
logs/hs_loss_fn=ce, hs_weight=2.0/events.out.tfevents.1723687604.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5e6e35bb00299cc0ee2cae00e6debf7fd51bfc1b9d10f6de4c5f3280b113d5e
3
+ size 249