--- base_model: gpt2 library_name: Distily license: mit tags: - generated_from_trainer model-index: - name: distily_bench_obj_cross_v2.15_gpt2 results: [] --- # distily_bench_obj_cross_v2.15_gpt2 This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified). The [Distily](https://github.com/lapp0/distily) library was used for this distillation. It achieves the following results on the evaluation set: - eval_enwikippl: 1784.0 - eval_frwikippl: 9792.0 - eval_zhwikippl: 72192.0 - eval_tinystoriesppl: 1448.0 - eval_loss: 2.5122 - eval_runtime: 17.041 - eval_samples_per_second: 58.682 - eval_steps_per_second: 7.335 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=1.0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None)) - train_embeddings: True - learning_rate: 0.0004 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.2 - num_epochs: 1.0 ### Resource Usage Peak GPU Memory: 8.0892 GB ### Eval-Phase Metrics | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 | | 0 | 0 | 2336462209024.0 | 122045790683136.0 | 22.4230 | 17.051 | 58.648 | 7.331 | 4429185024.0 | 25975962206208.0 | | 1000 | 0.0808 | 588.0 | 3680.0 | 1.8545 | 17.0585 | 58.622 | 7.328 | 612.0 | 2880.0 | | 2000 | 0.1616 | 988.0 | 5600.0 | 2.1657 | 17.0179 | 58.762 | 7.345 | 816.0 | 3200.0 | | 3000 | 0.2424 | 1744.0 | 8640.0 | 2.5064 | 17.083 | 58.538 | 7.317 | 1544.0 | 46080.0 | | 4000 | 0.3232 | 1864.0 | 8896.0 | 2.5506 | 17.0876 | 58.522 | 7.315 | 1544.0 | 63744.0 | | 5000 | 0.4040 | 1728.0 | 8832.0 | 2.4970 | 17.0783 | 58.554 | 7.319 | 1520.0 | 51200.0 | | 6000 | 0.4848 | 1936.0 | 9216.0 | 2.5779 | 17.0361 | 58.699 | 7.337 | 1688.0 | 63744.0 | | 7000 | 0.5657 | 2224.0 | 9792.0 | 2.6441 | 17.0202 | 58.754 | 7.344 | 1832.0 | 82944.0 | | 8000 | 0.6465 | 1936.0 | 8832.0 | 2.5707 | 17.057 | 58.627 | 7.328 | 1808.0 | 115200.0 | | 9000 | 0.7273 | 1784.0 | 9792.0 | 2.5122 | 17.041 | 58.682 | 7.335 | 1448.0 | 72192.0 | | 10000 | 0.8081 | 2064.0 | 9664.0 | 2.5934 | 17.147 | 58.319 | 7.29 | 1552.0 | 91648.0 | | 11000 | 0.8889 | 2064.0 | 10240.0 | 2.6004 | 17.0431 | 58.675 | 7.334 | 1720.0 | 80896.0 | | 12000 | 0.9697 | 2064.0 | 11008.0 | 2.6036 | 17.1142 | 58.431 | 7.304 | 1800.0 | 68608.0 | | 12375 | 1.0 | 1992.0 | 9280.0 | 2.5963 | 17.0677 | 58.59 | 7.324 | 1752.0 | 70144.0 | ### Framework versions - Distily 0.2.0 - Transformers 4.44.0 - Pytorch 2.3.0 - Datasets 2.21.0