lapp0's picture
Training in progress, step 61875
740f2d7 verified
|
raw
history blame
4.49 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 207.7861
  • eval_frwikippl: 15066.8408
  • eval_zhwikippl: 64727.0352
  • eval_tinystoriesppl: 24.1522
  • eval_loss: 13.2019
  • eval_runtime: 65.4151
  • eval_samples_per_second: 76.435
  • eval_steps_per_second: 9.554

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2677 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 21397.4785 57946.0117 18.3162 65.6143 76.203 9.525 12321.8145 60955.8008
3000 0.0485 207.9149 15083.8350 13.2031 65.3099 76.558 9.57 24.1822 63920.4375
6000 0.0970 207.6253 15109.3467 13.2019 65.2976 76.572 9.572 24.1253 65386.5469
9000 0.1455 207.7861 15066.8408 13.2019 65.4151 76.435 9.554 24.1522 64727.0352
12000 0.1939 207.4002 15100.8330 13.2016 65.3491 76.512 9.564 24.0894 65229.7188
15000 0.2424 207.8023 15100.8330 13.2017 65.4255 76.423 9.553 24.1442 65247.1406
18000 0.2909 208.3987 15075.3359 13.2031 65.4213 76.428 9.553 24.2462 64057.0078
21000 0.3394 208.0761 15100.8330 13.2026 65.2706 76.604 9.576 24.2142 64537.3164
24000 0.3879 207.9955 15100.8330 13.2027 65.2287 76.653 9.582 24.1822 64159.6602
27000 0.4364 208.3180 15058.3516 13.2033 65.1653 76.728 9.591 24.2272 63869.3125
30000 0.4848 207.1754 15100.8330 13.2016 65.1169 76.785 9.598 24.0546 65229.7188
33000 0.5333 208.0761 15083.8350 13.2026 65.2105 76.675 9.584 24.2412 64588.9727
36000 0.5818 207.1754 15066.8408 13.2023 65.3453 76.517 9.565 24.0715 65229.7188
39000 0.6303 207.1754 15100.8330 13.2017 65.2569 76.62 9.578 24.0695 65229.7188
42000 0.6788 207.3681 15058.3516 13.2021 65.2167 76.668 9.583 24.0954 64796.1484
45000 0.7273 207.9955 15100.8330 13.2026 65.2551 76.622 9.578 24.1982 64159.6602
48000 0.7758 207.7861 15092.3242 13.2017 65.3187 76.548 9.568 24.1412 64727.0352
51000 0.8242 208.2050 15058.3516 13.2029 65.2525 76.625 9.578 24.2262 64193.8711
54000 0.8727 207.7861 15100.8330 13.2027 65.2798 76.593 9.574 24.1362 64331.0312
57000 0.9212 207.7218 15100.8330 13.2017 65.2646 76.611 9.576 24.1163 65125.4180
60000 0.9697 208.5925 15092.3242 13.2034 65.2233 76.66 9.582 24.2653 63869.3125
61875 1.0 208.0116 15100.8330 13.2018 65.2936 76.577 9.572 24.2012 64917.2539

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0