distily_bitnet_gpt2 / README.md
lapp0's picture
End of training
2ba6a2b verified
|
raw
history blame
7.63 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - bitnet
  - 1.58b
  - generated_from_trainer
model-index:
  - name: distily_bitnet_gpt2
    results: []

distily_bitnet_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 179.0
  • eval_frwikippl: 624.0
  • eval_zhwikippl: 166.0
  • eval_tinystoriesppl: 159.0
  • eval_loss: 1.8254
  • eval_runtime: 30.606
  • eval_samples_per_second: 81.683
  • eval_steps_per_second: 10.227

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=1.0, loss_fn=cos, layer_mapper=layer-2, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.7840 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.25 61.25 11.6875 19.125
0 0 1082331758592.0 57174604644352.0 19.3008 30.4471 82.11 10.28 5268045824.0 25013889531904.0
1000 0.0162 8320.0 67072.0 4.7159 30.4361 82.139 10.284 4640.0 399360.0
2000 0.0323 1352.0 6688.0 3.3630 30.4661 82.058 10.274 856.0 42240.0
3000 0.0485 644.0 4192.0 2.7949 30.5076 81.947 10.26 410.0 2160.0
4000 0.0646 452.0 2672.0 2.5266 30.4909 81.992 10.265 308.0 592.0
5000 0.0808 348.0 1504.0 2.2655 30.4813 82.017 10.269 262.0 288.0
6000 0.0970 270.0 1072.0 2.0833 30.4751 82.034 10.271 223.0 225.0
7000 0.1131 226.0 944.0 1.9806 30.4982 81.972 10.263 189.0 182.0
8000 0.1293 198.0 752.0 1.8914 30.4573 82.082 10.277 172.0 159.0
9000 0.1455 179.0 624.0 1.8254 30.606 81.683 10.227 159.0 166.0
10000 0.1616 172.0 568.0 1.7317 30.4969 81.976 10.263 138.0 141.0
11000 0.1778 139.0 486.0 1.6030 30.4606 82.073 10.276 111.5 147.0
12000 0.1939 124.5 460.0 1.5229 30.4525 82.095 10.278 98.0 120.0
13000 0.2101 112.0 432.0 1.4591 30.4583 82.079 10.276 87.0 116.0
14000 0.2263 102.5 400.0 1.4049 30.4283 82.16 10.286 83.0 110.5
15000 0.2424 102.5 408.0 1.3847 30.4854 82.007 10.267 82.5 162.0
16000 0.2586 94.5 382.0 1.3539 30.5028 81.96 10.261 73.5 120.5
17000 0.2747 86.5 330.0 1.3206 30.5072 81.948 10.26 72.0 125.5
18000 0.2909 86.5 314.0 1.2763 30.4803 82.02 10.269 67.5 121.5
19000 0.3071 89.0 342.0 1.2988 30.498 81.973 10.263 74.0 130.0
20000 0.3232 78.0 310.0 1.2192 30.473 82.04 10.271 60.75 106.5
21000 0.3394 75.5 284.0 1.1612 30.4776 82.028 10.27 56.75 121.0
22000 0.3556 73.5 233.0 1.1233 30.4472 82.109 10.28 57.5 124.0
23000 0.3717 68.0 231.0 1.0997 30.4635 82.065 10.275 55.75 93.5
24000 0.3879 65.5 213.0 1.0867 30.4576 82.081 10.277 52.0 102.0
25000 0.4040 63.0 207.0 1.0647 30.4272 82.163 10.287 50.25 109.0
26000 0.4202 65.0 197.0 1.0463 30.5134 81.931 10.258 52.5 98.5
27000 0.4364 65.5 211.0 1.0503 30.4982 81.972 10.263 48.75 97.0
28000 0.4525 62.25 199.0 1.0304 30.4658 82.059 10.274 47.5 104.5
29000 0.4687 66.0 214.0 1.0459 30.4838 82.011 10.268 49.75 94.0
30000 0.4848 63.0 209.0 1.0378 30.4635 82.065 10.275 49.75 90.5
31000 0.5010 63.5 218.0 1.0314 30.5768 81.761 10.237 48.5 96.5
32000 0.5172 62.25 204.0 1.0225 30.4582 82.08 10.276 45.5 68.5
33000 0.5333 64.5 202.0 1.0209 30.4807 82.019 10.269 48.0 93.5
34000 0.5495 64.0 196.0 1.0165 30.4726 82.041 10.272 47.0 102.0
35000 0.5657 64.5 210.0 1.0109 30.4775 82.028 10.27 45.5 97.5
36000 0.5818 64.0 188.0 1.0084 30.4363 82.139 10.284 45.5 96.0
37000 0.5980 60.75 184.0 0.9871 30.479 82.024 10.269 44.25 77.5
38000 0.6141 59.75 190.0 0.9688 30.5442 81.849 10.247 43.5 74.0
39000 0.6303 59.0 172.0 0.9639 30.551 81.83 10.245 43.0 61.25
40000 0.6465 58.25 172.0 0.9537 30.489 81.997 10.266 42.0 69.0
41000 0.6626 60.5 176.0 0.9559 30.4953 81.98 10.264 41.0 60.5
42000 0.6788 57.75 184.0 0.9516 30.488 81.999 10.266 40.5 56.5
43000 0.6949 57.25 191.0 0.9369 30.4691 82.05 10.273 41.0 53.25
44000 0.7111 58.0 178.0 0.9370 30.4852 82.007 10.267 39.5 54.25
45000 0.7273 54.75 157.0 0.8904 30.4948 81.981 10.264 36.75 64.5
46000 0.7434 52.0 148.0 0.8656 30.5027 81.96 10.261 35.0 49.25
47000 0.7596 53.0 144.0 0.8540 30.4803 82.02 10.269 34.25 44.0
48000 0.7758 51.25 139.0 0.8430 30.452 82.097 10.278 33.0 44.25
49000 0.7919 52.0 146.0 0.8397 30.4747 82.035 10.271 33.5 41.75
50000 0.8081 51.75 140.0 0.8340 30.4359 82.14 10.284 33.5 40.0
51000 0.8242 50.25 136.0 0.8262 30.4828 82.013 10.268 32.5 40.25
52000 0.8404 50.5 138.0 0.8209 30.4471 82.11 10.28 32.25 37.5
53000 0.8566 50.25 140.0 0.8173 30.4685 82.052 10.273 32.0 36.0
54000 0.8727 49.75 136.0 0.8153 30.4908 81.992 10.265 31.875 36.5
55000 0.8889 50.0 138.0 0.8132 30.4798 82.022 10.269 31.75 36.75
56000 0.9051 50.0 136.0 0.8123 30.4915 81.99 10.265 31.75 36.5
57000 0.9212 50.0 135.0 0.8117 30.5178 81.919 10.256 31.75 36.25
58000 0.9374 49.75 134.0 0.8113 30.4452 82.115 10.281 31.625 36.25
59000 0.9535 50.0 134.0 0.8111 30.5075 81.947 10.26 31.75 36.25
60000 0.9697 50.0 134.0 0.8112 30.4961 81.978 10.264 31.75 36.25
61000 0.9859 50.0 134.0 0.8112 30.5439 81.849 10.248 31.75 36.25
61875 1.0 50.0 134.0 0.8112 30.6307 81.618 10.219 31.75 36.25

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0