lapp0's picture
Training in progress, step 61875
2a475a3 verified
|
raw
history blame
No virus
4.48 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 158.7294
  • eval_frwikippl: 15434.1611
  • eval_zhwikippl: 106089.8359
  • eval_tinystoriesppl: 15.5930
  • eval_loss: 2.3671
  • eval_runtime: 65.5679
  • eval_samples_per_second: 76.257
  • eval_steps_per_second: 9.532

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2677 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 21397.4785 57946.0117 6.1625 65.3142 76.553 9.569 12321.8145 60955.8008
3000 0.0485 158.5205 15451.5547 2.3672 65.4452 76.4 9.55 15.5596 105976.6797
6000 0.0970 159.4627 15519.1768 2.3670 65.5893 76.232 9.529 15.6570 107916.8281
9000 0.1455 158.7294 15434.1611 2.3671 65.5679 76.257 9.532 15.5930 106089.8359
12000 0.1939 159.8214 15466.7988 2.3670 65.4602 76.382 9.548 15.6933 107457.1484
15000 0.2424 159.5122 15501.6934 2.3671 65.4126 76.438 9.555 15.6648 107916.8281
18000 0.2909 158.8771 15434.1611 2.3670 65.4041 76.448 9.556 15.5956 106033.1953
21000 0.3394 159.3146 15434.1611 2.3671 65.4872 76.351 9.544 15.6460 106089.8359
24000 0.3879 159.4504 15434.1611 2.3670 65.5249 76.307 9.538 15.6589 106089.8359
27000 0.4364 158.9386 15386.3984 2.3669 65.4767 76.363 9.545 15.6163 105581.5391
30000 0.4848 159.1728 15451.5547 2.3671 65.3648 76.494 9.562 15.6369 107342.5391
33000 0.5333 159.8709 15466.7988 2.3670 65.4363 76.41 9.551 15.6965 106942.4062
36000 0.5818 159.2097 15460.2656 2.3670 65.4686 76.373 9.547 15.6318 107629.3516
39000 0.6303 158.6066 15503.8809 2.3670 65.5342 76.296 9.537 15.5724 107744.2734
42000 0.6788 158.5205 15468.9824 2.3671 65.5105 76.324 9.54 15.5576 107399.8828
45000 0.7273 158.7909 15399.4043 2.3670 65.5316 76.299 9.537 15.6163 106089.8359
48000 0.7758 158.7909 15434.1611 2.3671 65.4706 76.37 9.546 15.6027 106373.1953
51000 0.8242 158.8033 15425.4648 2.3669 65.5734 76.25 9.531 15.6169 106089.8359
54000 0.8727 158.9263 15434.1611 2.3670 65.5021 76.333 9.542 15.6085 106486.7812
57000 0.9212 159.3887 15451.5547 2.3671 65.5842 76.238 9.53 15.6505 107342.5391
60000 0.9697 159.4874 15390.7422 2.3670 65.5517 76.276 9.534 15.6641 105581.5391
61875 1.0 159.6729 15492.9736 2.3671 65.3871 76.468 9.558 15.6926 107342.5391

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0