metadata
base_model: gpt2
library_name: distily
license: mit
tags:
- generated_from_trainer
model-index:
- name: distily_bench_gpt2_batch_size
results: []
distily_bench_gpt2_batch_size
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 579.5842
- eval_frwikippl: 3891.8010
- eval_zhwikippl: 6702.2964
- eval_loss: 7658.3999
- eval_runtime: 21.5573
- eval_samples_per_second: 46.388
- eval_steps_per_second: 11.597
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_objective: <distily.objectives.LegacyObjective object at 0x7fd56ca85c90>
- train_embeddings: True
- learning_rate: 4e-05
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 4.0814 GB
Eval-Phase Metrics
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
---|---|---|---|---|---|---|---|---|
teacher eval | 30.2385 | 57.2728 | 18.1772 | |||||
0 | 0 | 56994.4609 | 58386.3438 | 333144.0625 | 21.6098 | 46.275 | 11.569 | 60802.0039 |
500 | 0.0101 | 2099.8235 | 11678.0371 | 13260.6084 | 21.3836 | 46.765 | 11.691 | 69590.5391 |
1000 | 0.0202 | 1574.0366 | 8011.8809 | 10600.8320 | 21.2508 | 47.057 | 11.764 | 52850.8906 |
1500 | 0.0303 | 1301.3883 | 6674.1611 | 10162.4316 | 21.459 | 46.601 | 11.65 | 34488.375 |
2000 | 0.0404 | 1113.5813 | 5583.1753 | 9478.5283 | 21.4684 | 46.58 | 11.645 | 27443.7676 |
2500 | 0.0505 | 1004.2922 | 5359.0864 | 9228.7998 | 21.3125 | 46.921 | 11.73 | 26546.2461 |
3000 | 0.0606 | 914.3858 | 4987.7397 | 9218.7520 | 21.3671 | 46.801 | 11.7 | 13178.9082 |
3500 | 0.0707 | 860.5787 | 4993.3696 | 8780.2881 | 21.3231 | 46.898 | 11.724 | 20241.6133 |
4000 | 0.0808 | 810.8665 | 4433.4043 | 8697.4404 | 21.2626 | 47.031 | 11.758 | 18648.1777 |
4500 | 0.0909 | 769.4886 | 4542.2461 | 8522.4639 | 21.4471 | 46.626 | 11.657 | 14555.5088 |
5000 | 0.1010 | 741.9254 | 4665.9185 | 8346.4316 | 21.1682 | 47.241 | 11.81 | 10137.9199 |
5500 | 0.1111 | 714.7664 | 4329.6104 | 8166.9438 | 21.4303 | 46.663 | 11.666 | 13222.1006 |
6000 | 0.1212 | 692.0859 | 4471.0703 | 8177.6001 | 21.4078 | 46.712 | 11.678 | 10649.9824 |
6500 | 0.1313 | 659.9261 | 4580.1948 | 8073.7598 | 21.198 | 47.174 | 11.794 | 12113.9268 |
7000 | 0.1414 | 636.1021 | 4219.9077 | 7905.0562 | 21.2741 | 47.005 | 11.751 | 11793.8877 |
7500 | 0.1515 | 623.1702 | 4116.0293 | 7826.2402 | 21.2569 | 47.044 | 11.761 | 11638.9893 |
8000 | 0.1616 | 614.8783 | 4148.5176 | 7826.7520 | 21.2964 | 46.956 | 11.739 | 13476.1084 |
8500 | 0.1717 | 601.8520 | 4003.9678 | 7738.1118 | 21.4281 | 46.668 | 11.667 | 11412.7490 |
9000 | 0.1818 | 580.8234 | 3757.6580 | 7625.6001 | 21.6505 | 46.188 | 11.547 | 6242.6709 |
9500 | 0.1919 | 579.5842 | 3891.8010 | 7658.3999 | 21.5573 | 46.388 | 11.597 | 6702.2964 |
10000 | 0.2020 | 563.3217 | 3843.6697 | 7557.6641 | 21.4934 | 46.526 | 11.631 | 6892.9072 |
10500 | 0.2121 | 554.2101 | 3611.4167 | 7487.4878 | 21.5876 | 46.323 | 11.581 | 6533.5151 |
11000 | 0.2222 | 533.5391 | 3924.4539 | 7479.3599 | 21.5677 | 46.366 | 11.591 | 4041.2058 |
11500 | 0.2323 | 539.1932 | 3840.4197 | 7422.5601 | 21.3741 | 46.786 | 11.696 | 2984.6418 |
12000 | 0.2424 | 530.1937 | 3717.7319 | 7437.3760 | 21.5909 | 46.316 | 11.579 | 4198.5317 |
12500 | 0.2525 | 517.9953 | 3501.5972 | 7306.0801 | 21.5337 | 46.439 | 11.61 | 3271.1892 |
13000 | 0.2626 | 515.5474 | 3430.8489 | 7287.4878 | 21.6197 | 46.254 | 11.564 | 4228.9199 |
13500 | 0.2727 | 516.9103 | 3583.5176 | 7331.5840 | 21.6005 | 46.295 | 11.574 | 6539.6245 |
14000 | 0.2828 | 496.1355 | 3821.2432 | 7329.7920 | 21.4982 | 46.516 | 11.629 | 5327.2339 |
14500 | 0.2929 | 498.4330 | 3740.2107 | 7232.8960 | 21.5819 | 46.335 | 11.584 | 5059.5977 |
15000 | 0.3030 | 495.7023 | 3717.9944 | 7149.5361 | 21.4158 | 46.694 | 11.674 | 2332.2563 |
15500 | 0.3131 | 491.6768 | 3593.3838 | 7156.4482 | 21.2342 | 47.094 | 11.773 | 3195.2048 |
16000 | 0.3232 | 483.2642 | 3478.8335 | 7121.9521 | 21.2238 | 47.117 | 11.779 | 3729.5500 |
16500 | 0.3333 | 477.9181 | 3424.2036 | 7113.9839 | 21.3606 | 46.815 | 11.704 | 4778.8506 |
17000 | 0.3434 | 473.8991 | 3581.3721 | 7150.6240 | 21.2836 | 46.985 | 11.746 | 2268.9734 |
17500 | 0.3535 | 471.4035 | 3375.7810 | 7056.4482 | 21.4184 | 46.689 | 11.672 | 2958.4526 |
18000 | 0.3636 | 466.1978 | 3323.2354 | 7070.1118 | 21.3173 | 46.91 | 11.728 | 3852.8152 |
18500 | 0.3737 | 464.6797 | 3391.8843 | 6952.3521 | 21.5144 | 46.481 | 11.62 | 6839.7295 |
19000 | 0.3838 | 462.5197 | 3305.7080 | 6933.4399 | 21.3481 | 46.843 | 11.711 | 3396.2700 |
19500 | 0.3939 | 456.2503 | 3340.5020 | 6974.1440 | 21.3181 | 46.909 | 11.727 | 4338.4556 |
20000 | 0.4040 | 453.3807 | 3245.5469 | 6936.5439 | 21.3635 | 46.809 | 11.702 | 3513.4419 |
20500 | 0.4141 | 453.9622 | 3146.9612 | 6961.3442 | 21.3014 | 46.945 | 11.736 | 10044.2734 |
21000 | 0.4242 | 452.8354 | 2937.5862 | 6912.8638 | 21.428 | 46.668 | 11.667 | 4067.4631 |
21500 | 0.4343 | 441.9103 | 2893.3921 | 6879.7119 | 21.3113 | 46.923 | 11.731 | 5412.9268 |
22000 | 0.4444 | 445.0268 | 2878.3350 | 6833.9839 | 21.5124 | 46.485 | 11.621 | 3586.4441 |
22500 | 0.4545 | 433.9949 | 3140.9766 | 6801.0562 | 21.4889 | 46.536 | 11.634 | 4264.9297 |
23000 | 0.4646 | 432.1537 | 3241.2009 | 6835.2002 | 21.4958 | 46.521 | 11.63 | 7089.4131 |
23500 | 0.4747 | 438.6622 | 3099.2891 | 6846.0479 | 21.3978 | 46.734 | 11.683 | 2764.0474 |
24000 | 0.4848 | 434.6780 | 3037.6338 | 6746.4639 | 21.4299 | 46.664 | 11.666 | 6095.2222 |
24500 | 0.4949 | 433.0188 | 3190.7532 | 6871.6479 | 21.4752 | 46.565 | 11.641 | 6818.7515 |
25000 | 0.5051 | 424.1827 | 2884.4297 | 6806.0479 | 21.2002 | 47.169 | 11.792 | 5655.6611 |
25500 | 0.5152 | 427.9544 | 2899.9268 | 6739.4878 | 21.4326 | 46.658 | 11.664 | 10928.7627 |
26000 | 0.5253 | 418.4491 | 2792.2812 | 6741.0562 | 21.4399 | 46.642 | 11.661 | 4652.5972 |
26500 | 0.5354 | 420.5338 | 2771.0999 | 6723.6162 | 21.5377 | 46.43 | 11.608 | 5530.9321 |
27000 | 0.5455 | 414.0452 | 2715.1108 | 6704.3521 | 21.8117 | 45.847 | 11.462 | 4411.1870 |
27500 | 0.5556 | 405.4073 | 2623.3743 | 6684.0 | 21.6362 | 46.219 | 11.555 | 4443.4106 |
28000 | 0.5657 | 410.8664 | 2691.8567 | 6677.0562 | 21.5795 | 46.34 | 11.585 | 1948.9584 |
28500 | 0.5758 | 418.1162 | 2795.4333 | 6772.7041 | 21.5011 | 46.509 | 11.627 | 2152.1055 |
29000 | 0.5859 | 407.0003 | 2837.7319 | 6612.7358 | 21.6658 | 46.156 | 11.539 | 2232.7546 |
29500 | 0.5960 | 407.4271 | 2949.1045 | 6649.2158 | 21.6025 | 46.291 | 11.573 | 3101.2493 |
30000 | 0.6061 | 406.1163 | 2778.8286 | 6607.7759 | 21.5146 | 46.48 | 11.62 | 3840.7419 |
30500 | 0.6162 | 397.9757 | 2956.0779 | 6601.0562 | 21.4872 | 46.539 | 11.635 | 2564.0315 |
31000 | 0.6263 | 398.2077 | 2838.1323 | 6594.9121 | 22.1693 | 45.107 | 11.277 | 2501.1306 |
31500 | 0.6364 | 393.3900 | 2667.1082 | 6559.9360 | 21.4915 | 46.53 | 11.633 | 5743.9526 |
32000 | 0.6465 | 393.8561 | 2583.0869 | 6566.1758 | 21.5166 | 46.476 | 11.619 | 8028.9990 |
32500 | 0.6566 | 391.7058 | 2675.8672 | 6583.2002 | 21.6273 | 46.238 | 11.559 | 5334.7124 |
33000 | 0.6667 | 396.9419 | 2743.4949 | 6698.2402 | 21.5042 | 46.503 | 11.626 | 11934.8896 |
33500 | 0.6768 | 388.6004 | 2891.6582 | 6570.7520 | 21.2945 | 46.961 | 11.74 | 4139.7988 |
34000 | 0.6869 | 386.5763 | 2826.3506 | 6525.6318 | 21.3684 | 46.798 | 11.7 | 3156.8203 |
34500 | 0.6970 | 387.0721 | 2805.7012 | 6572.9600 | 21.2897 | 46.971 | 11.743 | 2896.1072 |
35000 | 0.7071 | 386.0813 | 2637.3757 | 6580.5439 | 21.2409 | 47.079 | 11.77 | 7566.7905 |
35500 | 0.7172 | 381.5364 | 3025.4507 | 6588.3198 | 21.5446 | 46.415 | 11.604 | 4902.9575 |
36000 | 0.7273 | 386.6814 | 2880.9741 | 6570.8481 | 21.3516 | 46.835 | 11.709 | 3154.9243 |
36500 | 0.7374 | 379.9471 | 2795.0400 | 6521.5679 | 21.4418 | 46.638 | 11.659 | 3810.8567 |
37000 | 0.7475 | 383.0058 | 2805.8992 | 6537.6641 | 21.3615 | 46.813 | 11.703 | 5655.2837 |
37500 | 0.7576 | 375.7296 | 2787.7578 | 6456.9922 | 21.3662 | 46.803 | 11.701 | 3055.8257 |
38000 | 0.7677 | 374.0701 | 2868.8132 | 6484.3198 | 21.3768 | 46.78 | 11.695 | 2952.7307 |
38500 | 0.7778 | 377.5502 | 2659.9729 | 6455.3921 | 21.3661 | 46.803 | 11.701 | 3218.3279 |
39000 | 0.7879 | 370.5863 | 2806.0972 | 6473.3120 | 21.2561 | 47.045 | 11.761 | 2280.2119 |
39500 | 0.7980 | 371.9195 | 2613.6814 | 6536.6719 | 21.3516 | 46.835 | 11.709 | 2672.7583 |
40000 | 0.8081 | 377.1619 | 2487.1150 | 6439.7441 | 21.4296 | 46.664 | 11.666 | 2315.8076 |
40500 | 0.8182 | 370.4856 | 2678.1318 | 6437.2798 | 21.3153 | 46.915 | 11.729 | 1819.0656 |
41000 | 0.8283 | 369.2075 | 2614.6948 | 6462.3999 | 21.4041 | 46.72 | 11.68 | 2854.2568 |
41500 | 0.8384 | 372.8739 | 2305.3298 | 6431.2002 | 21.4425 | 46.636 | 11.659 | 3267.0427 |
42000 | 0.8485 | 368.2697 | 2281.5596 | 6418.3042 | 21.2858 | 46.98 | 11.745 | 2240.3704 |
42500 | 0.8586 | 365.9109 | 2410.3772 | 6468.8638 | 21.4759 | 46.564 | 11.641 | 3584.7686 |
43000 | 0.8687 | 367.1704 | 2442.8845 | 6401.3760 | 21.5525 | 46.398 | 11.6 | 2345.6868 |
43500 | 0.8788 | 363.9908 | 2523.0574 | 6458.4961 | 21.7663 | 45.943 | 11.486 | 3812.3833 |
44000 | 0.8889 | 363.7012 | 2468.5098 | 6388.8638 | 21.7639 | 45.948 | 11.487 | 4788.1108 |
44500 | 0.8990 | 363.1368 | 2572.5454 | 6479.6479 | 21.67 | 46.147 | 11.537 | 3193.9253 |
45000 | 0.9091 | 356.2796 | 2622.3564 | 6405.2158 | 21.6556 | 46.177 | 11.544 | 1944.5388 |
45500 | 0.9192 | 360.0483 | 2560.6021 | 6401.0239 | 21.3614 | 46.813 | 11.703 | 6363.8784 |
46000 | 0.9293 | 358.6112 | 2230.1096 | 6385.6958 | 21.3445 | 46.85 | 11.713 | 2245.4624 |
46500 | 0.9394 | 359.0361 | 2364.5928 | 6378.6558 | 21.4319 | 46.659 | 11.665 | 2161.8982 |
47000 | 0.9495 | 356.5909 | 2449.0066 | 6407.8081 | 21.4857 | 46.543 | 11.636 | 3063.7917 |
47500 | 0.9596 | 359.0292 | 2401.2183 | 6344.3521 | 21.5028 | 46.505 | 11.626 | 3229.5225 |
48000 | 0.9697 | 359.6570 | 2497.3064 | 6563.9038 | 21.3228 | 46.898 | 11.725 | 3209.3140 |
48500 | 0.9798 | 353.2013 | 2481.0728 | 6333.3442 | 21.4465 | 46.628 | 11.657 | 2960.4282 |
49000 | 0.9899 | 355.4300 | 2554.2913 | 6356.8638 | 21.2635 | 47.029 | 11.757 | 3479.5901 |
49500 | 1.0 | 352.3520 | 2577.0833 | 6367.2959 | 21.3211 | 46.902 | 11.725 | 3190.5127 |
Framework versions
- Distily 0.2.0
- Transformers 4.44.0
- Pytorch 2.3.0
- Datasets 2.20.0