lapp0's picture
End of training
56da803 verified
|
raw
history blame
No virus
4.45 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 150.6954
  • eval_frwikippl: 20983.1934
  • eval_zhwikippl: 163274.0312
  • eval_tinystoriesppl: 13.6584
  • eval_loss: 2.1824
  • eval_runtime: 65.7475
  • eval_samples_per_second: 76.049
  • eval_steps_per_second: 9.506

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2666 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 15507.5488 57030.9961 5.9658 65.2151 76.669 9.584 7965.0278 102309.4219
3000 0.0485 150.3223 21042.3887 2.1825 65.2954 76.575 9.572 13.5953 164147.5625
6000 0.0970 150.5029 21042.3887 2.1824 65.675 76.132 9.517 13.6268 164498.2812
9000 0.1455 150.6954 20983.1934 2.1824 65.7475 76.049 9.506 13.6584 163274.0312
12000 0.1939 150.3514 20983.1934 2.1823 65.8055 75.981 9.498 13.6251 162882.4219
15000 0.2424 150.6196 21042.3887 2.1824 65.3959 76.457 9.557 13.6482 164147.5625
18000 0.2909 150.6487 21042.3887 2.1824 65.5594 76.267 9.533 13.6533 163797.5938
21000 0.3394 150.3980 21042.3887 2.1824 65.4773 76.362 9.545 13.6234 163972.4844
24000 0.3879 150.5495 21131.4961 2.1825 65.4348 76.412 9.551 13.6184 164849.75
27000 0.4364 150.7538 21042.3887 2.1822 65.4127 76.438 9.555 13.6635 162925.9219
30000 0.4848 150.6954 21042.3887 2.1825 65.4113 76.439 9.555 13.6510 164586.0
33000 0.5333 151.0109 20983.1934 2.1823 65.6274 76.188 9.523 13.6832 163186.8594
36000 0.5818 150.3514 21042.3887 2.1824 65.4107 76.44 9.555 13.6184 164586.0
39000 0.6303 150.6020 20983.1934 2.1823 65.415 76.435 9.554 13.6550 163884.9375
42000 0.6788 150.5495 21042.3887 2.1823 65.3696 76.488 9.561 13.6454 163186.8594
45000 0.7273 150.3223 20995.0234 2.1824 65.7092 76.093 9.512 13.6257 163274.0312
48000 0.7758 150.8706 21042.3887 2.1824 65.5511 76.276 9.535 13.6652 163186.8594
51000 0.8242 150.8940 21006.8594 2.1823 65.6118 76.206 9.526 13.6719 163186.8594
54000 0.8727 150.4738 20918.2773 2.1824 65.6557 76.155 9.519 13.6539 162925.9219
57000 0.9212 150.4446 21042.3887 2.1824 65.3885 76.466 9.558 13.6257 163622.8906
60000 0.9697 150.4097 20918.2773 2.1824 65.4087 76.442 9.555 13.6533 162795.4688
61875 1.0 150.6896 21042.3887 2.1825 65.8705 75.907 9.488 13.6533 163972.4844

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0