distily_bitnet_gpt2 / README.md
lapp0's picture
End of training
560b542 verified
|
raw
history blame
7.65 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - bitnet
  - 1.58b
  - generated_from_trainer
model-index:
  - name: distily_bitnet_gpt2
    results: []

distily_bitnet_gpt2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 184.0
  • eval_frwikippl: 744.0
  • eval_zhwikippl: 180.0
  • eval_tinystoriesppl: 148.0
  • eval_loss: 1.1860
  • eval_runtime: 29.84
  • eval_samples_per_second: 83.78
  • eval_steps_per_second: 10.489

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.5
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 7.5008 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 43.25 61.25 11.6875 19.125
0 0 841813590016.0 42880953483264.0 19.1388 29.7619 84.0 10.517 2533359616.0 18691697672192.0
1000 0.0162 8256.0 104448.0 3.7570 29.792 83.915 10.506 4608.0 250880.0
2000 0.0323 1488.0 8576.0 2.5011 29.7856 83.933 10.508 828.0 40448.0
3000 0.0485 668.0 4384.0 2.0176 29.8589 83.727 10.483 442.0 1648.0
4000 0.0646 444.0 2304.0 1.7541 29.853 83.744 10.485 308.0 672.0
5000 0.0808 328.0 1288.0 1.5507 29.8272 83.816 10.494 258.0 242.0
6000 0.0970 266.0 1168.0 1.3948 29.9048 83.599 10.467 217.0 253.0
7000 0.1131 229.0 1048.0 1.3140 29.8053 83.878 10.501 181.0 189.0
8000 0.1293 202.0 760.0 1.2384 29.8461 83.763 10.487 166.0 187.0
9000 0.1455 184.0 744.0 1.1860 29.84 83.78 10.489 148.0 180.0
10000 0.1616 161.0 564.0 1.0820 29.8521 83.746 10.485 132.0 170.0
11000 0.1778 139.0 478.0 0.9691 29.7904 83.92 10.507 112.5 139.0
12000 0.1939 122.5 446.0 0.8903 29.8277 83.815 10.494 91.0 153.0
13000 0.2101 130.0 450.0 0.8290 29.8764 83.678 10.476 113.5 148.0
14000 0.2263 111.5 410.0 0.7867 29.8386 83.784 10.49 85.5 116.5
15000 0.2424 103.0 394.0 0.7550 29.7824 83.942 10.51 81.5 126.5
16000 0.2586 95.5 368.0 0.7130 29.8395 83.781 10.489 74.0 137.0
17000 0.2747 91.0 370.0 0.6869 29.8002 83.892 10.503 72.0 110.0
18000 0.2909 89.5 356.0 0.6569 29.8522 83.746 10.485 65.0 124.5
19000 0.3071 87.0 354.0 0.6839 29.8823 83.661 10.474 68.5 137.0
20000 0.3232 79.5 290.0 0.6065 29.8977 83.618 10.469 65.0 113.5
21000 0.3394 75.0 251.0 0.5674 29.8207 83.834 10.496 59.75 112.5
22000 0.3556 70.0 250.0 0.5363 29.8336 83.798 10.492 56.25 81.0
23000 0.3717 69.0 220.0 0.5125 29.8003 83.892 10.503 53.75 86.5
24000 0.3879 65.5 226.0 0.5047 29.8312 83.805 10.492 52.25 91.0
25000 0.4040 65.5 211.0 0.4917 29.8281 83.813 10.493 55.0 141.0
26000 0.4202 63.25 204.0 0.4817 29.8227 83.829 10.495 50.75 86.5
27000 0.4364 64.5 213.0 0.4738 29.9242 83.544 10.46 51.25 94.5
28000 0.4525 62.75 192.0 0.4619 29.9106 83.583 10.465 48.75 113.5
29000 0.4687 64.5 204.0 0.4840 29.8026 83.885 10.502 52.5 81.5
30000 0.4848 65.0 217.0 0.4796 29.8897 83.641 10.472 49.25 140.0
31000 0.5010 63.5 206.0 0.4689 29.8072 83.872 10.501 48.25 141.0
32000 0.5172 63.25 217.0 0.4726 29.8682 83.701 10.479 46.25 112.5
33000 0.5333 66.5 231.0 0.4654 29.7912 83.917 10.506 51.25 87.5
34000 0.5495 62.75 200.0 0.4547 29.8255 83.821 10.494 49.75 89.5
35000 0.5657 63.75 196.0 0.4552 29.8185 83.841 10.497 49.25 83.5
36000 0.5818 63.75 215.0 0.4588 29.8868 83.649 10.473 46.0 113.5
37000 0.5980 61.5 193.0 0.4382 29.825 83.822 10.495 46.25 130.0
38000 0.6141 61.5 193.0 0.4237 29.8213 83.833 10.496 45.75 75.5
39000 0.6303 61.5 187.0 0.4218 29.8194 83.838 10.497 44.0 82.5
40000 0.6465 59.75 178.0 0.4127 29.8348 83.795 10.491 42.75 100.5
41000 0.6626 58.0 184.0 0.4133 29.778 83.955 10.511 42.25 119.0
42000 0.6788 56.75 184.0 0.4072 29.8696 83.697 10.479 40.75 109.0
43000 0.6949 57.75 184.0 0.3986 29.8393 83.782 10.49 41.75 87.0
44000 0.7111 58.0 180.0 0.4014 29.8433 83.771 10.488 40.5 101.0
45000 0.7273 55.75 158.0 0.3611 29.8497 83.753 10.486 38.25 67.0
46000 0.7434 55.0 148.0 0.3377 29.8619 83.719 10.482 36.0 63.75
47000 0.7596 52.25 143.0 0.3271 29.8199 83.837 10.496 35.0 50.75
48000 0.7758 52.0 141.0 0.3185 29.8125 83.857 10.499 34.0 49.25
49000 0.7919 52.5 142.0 0.3146 29.9037 83.602 10.467 33.5 43.5
50000 0.8081 51.25 134.0 0.3096 29.8931 83.631 10.471 33.25 46.25
51000 0.8242 51.25 133.0 0.3025 30.0212 83.274 10.426 32.5 40.0
52000 0.8404 51.5 132.0 0.2984 29.8459 83.764 10.487 32.5 39.5
53000 0.8566 50.5 131.0 0.2951 29.8292 83.81 10.493 32.5 36.0
54000 0.8727 50.5 132.0 0.2934 29.9146 83.571 10.463 32.25 37.75
55000 0.8889 50.0 131.0 0.2918 29.8217 83.831 10.496 32.5 35.75
56000 0.9051 50.0 130.0 0.2911 29.8366 83.79 10.49 32.25 35.5
57000 0.9212 50.0 130.0 0.2903 29.8261 83.819 10.494 32.25 35.5
58000 0.9374 50.0 130.0 0.2901 29.8639 83.713 10.481 32.25 35.5
59000 0.9535 50.0 130.0 0.2900 29.8256 83.821 10.494 32.25 35.25
60000 0.9697 50.0 130.0 0.2899 29.8767 83.677 10.476 32.25 35.25
61000 0.9859 50.0 130.0 0.2900 29.8188 83.84 10.497 32.25 35.25
61875 1.0 50.0 130.0 0.2899 29.869 83.699 10.479 32.25 35.25

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0