lapp0's picture
End of training
a260409 verified
|
raw
history blame
4.56 kB
metadata
base_model: roneneldan/TinyStories-33M
library_name: Distily
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_obj_cross
    results: []

distily_bench_obj_cross

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 3944.9531
  • eval_frwikippl: 30197.7344
  • eval_zhwikippl: 52496.3438
  • eval_tinystoriesppl: 1385.5492
  • eval_loss: 16.6107
  • eval_runtime: 66.9937
  • eval_samples_per_second: 74.634
  • eval_steps_per_second: 9.329

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=10.0, loss_fn=raw_mse, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=10.0, loss_fn=mse, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0004
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2677 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 21397.4785 57946.0117 18.3162 67.1093 74.505 9.313 12321.8145 60955.8008
3000 0.0485 3940.0654 30197.7344 16.6107 67.006 74.62 9.328 1383.7178 52496.3438
6000 0.0970 3937.6238 30180.7188 16.6119 67.1095 74.505 9.313 1383.9467 52496.3438
9000 0.1455 3944.9531 30197.7344 16.6107 66.9937 74.634 9.329 1385.5492 52496.3438
12000 0.1939 3944.9531 30197.7344 16.6115 67.0666 74.553 9.319 1384.8617 52496.3438
15000 0.2424 3937.6238 30180.7188 16.6121 66.6143 75.059 9.382 1385.3201 52524.3359
18000 0.2909 3937.6238 30180.7188 16.6117 66.7575 74.898 9.362 1384.1750 52552.3945
21000 0.3394 3944.9531 30197.7344 16.6115 66.679 74.986 9.373 1384.4041 52496.3438
24000 0.3879 3944.9531 30197.7344 16.6121 66.8908 74.749 9.344 1384.4041 52468.3164
27000 0.4364 3942.5085 30197.7344 16.6117 66.4311 75.266 9.408 1383.0317 52496.3438
30000 0.4848 3940.0654 30180.7188 16.6107 66.4762 75.215 9.402 1383.2599 52496.3438
33000 0.5333 3937.6238 30197.7344 16.6107 66.4814 75.209 9.401 1382.8029 52496.3438
36000 0.5818 3942.5085 30180.7188 16.6111 67.3001 74.294 9.287 1385.3201 52496.3438
39000 0.6303 3937.6238 30180.7188 16.6115 67.0065 74.62 9.327 1383.4888 52496.3438
42000 0.6788 3942.5085 30197.7344 16.6109 66.7444 74.913 9.364 1384.1750 52496.3438
45000 0.7273 3941.2869 30197.7344 16.6115 67.1516 74.458 9.307 1382.8029 52496.3438
48000 0.7758 3944.9531 30180.7188 16.6107 66.7762 74.877 9.36 1386.6947 52524.3359
51000 0.8242 3942.5085 30197.7344 16.6111 67.2623 74.336 9.292 1384.8617 52496.3438
54000 0.8727 3944.9531 30180.7188 16.6107 66.724 74.936 9.367 1385.3201 52496.3438
57000 0.9212 3941.2869 30197.7344 16.6115 67.0602 74.56 9.32 1382.8029 52468.3164
60000 0.9697 3942.5085 30197.7344 16.6119 67.4137 74.169 9.271 1382.8029 52468.3164
61875 1.0 3937.6238 30180.7188 16.6119 67.1794 74.428 9.303 1383.7178 52496.3438

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0