lapp0's picture
End of training
4572dcc verified
|
raw
history blame
3.65 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross_v2
    results: []

distily_bench_obj_cross_v2

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 1882.2876
  • eval_frwikippl: 38923.2266
  • eval_zhwikippl: 63461.6641
  • eval_tinystoriesppl: 451.2739
  • eval_loss: 4.8257
  • eval_runtime: 13.1445
  • eval_samples_per_second: 76.078
  • eval_steps_per_second: 9.51

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=2.0, loss_fn=mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine_with_restarts
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.1729 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 10909.4980 77116.0 6.3550 13.1937 75.794 9.474 4267.7983 73081.2031
1000 0.0808 1884.7683 38923.2266 4.8260 13.1354 76.13 9.516 453.2929 63529.4258
2000 0.1616 1882.5793 38923.2266 4.8257 13.2412 75.522 9.44 451.5352 63461.6641
3000 0.2424 1882.5793 38923.2266 4.8257 13.2384 75.538 9.442 451.6844 63461.6641
4000 0.3232 1881.7043 38923.2266 4.8257 13.2242 75.619 9.452 450.9009 63461.6641
5000 0.4040 1883.1630 38923.2266 4.8257 13.1558 76.012 9.501 451.8337 63461.6641
6000 0.4848 1883.1630 38923.2266 4.8257 13.2198 75.644 9.456 451.8337 63461.6641
7000 0.5657 1884.4762 38923.2266 4.8257 13.2183 75.653 9.457 452.8433 63529.4258
8000 0.6465 1882.5793 38923.2266 4.8257 13.1236 76.198 9.525 451.4604 63461.6641
9000 0.7273 1882.2876 38923.2266 4.8257 13.1445 76.078 9.51 451.2739 63461.6641
10000 0.8081 1880.2477 38923.2266 4.8257 13.2204 75.641 9.455 450.4167 63461.6641
11000 0.8889 1882.5793 38923.2266 4.8257 13.267 75.375 9.422 451.7592 63461.6641
12000 0.9697 1883.1630 38923.2266 4.8257 13.182 75.861 9.483 451.8337 63461.6641
12375 1.0 1883.1630 38923.2266 4.8257 13.202 75.746 9.468 451.8337 63461.6641

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0