lapp0's picture
Training in progress, step 15469
026baad verified
|
raw
history blame
No virus
3.15 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 5102.7344
  • eval_frwikippl: 36133.4453
  • eval_zhwikippl: 51745.4414
  • eval_tinystoriesppl: 1872.7611
  • eval_loss: 17.3780
  • eval_runtime: 33.2598
  • eval_samples_per_second: 75.166
  • eval_steps_per_second: 9.411

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 16.2498 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 27673.0234 72839.8984 19.1728 33.1089 75.508 9.454 16450.7617 62570.5625
2000 0.1293 5104.3164 36174.1641 17.3784 33.2758 75.13 9.406 1874.0001 51745.4414
4000 0.2586 5104.3164 36214.9648 17.3776 33.0549 75.632 9.469 1873.0709 51773.0352
6000 0.3879 5104.3164 36133.4453 17.3784 33.0479 75.648 9.471 1874.3102 51717.8164
8000 0.5172 5102.7344 36133.4453 17.3780 33.2598 75.166 9.411 1872.7611 51745.4414
10000 0.6465 5105.8984 36133.4453 17.3784 33.2485 75.191 9.414 1875.8596 51745.4414
12000 0.7757 5105.8984 36133.4453 17.3780 33.0133 75.727 9.481 1874.9297 51717.8164
14000 0.9050 5104.3164 36214.9648 17.3776 33.2272 75.24 9.42 1872.7611 51745.4414
15469 1.0 5104.3164 36133.4453 17.3784 33.002 75.753 9.484 1874.3102 51745.4414

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0