File size: 18,881 Bytes
22de1d3 50aa989 22de1d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
2021-05-26 19:11:02,756 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_large_uncased.yaml', log='log/glue/mnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021-05-26 19:11:02,808 INFO __main__ Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True
2021-05-26 19:11:33,729 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-26 19:12:20,927 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/ax/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-26 19:12:22,683 INFO __main__ Start training
2021-05-26 19:12:22,684 INFO torchdistill.models.util [student model]
2021-05-26 19:12:22,684 INFO torchdistill.models.util Using the original student model
2021-05-26 19:12:22,684 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
2021-05-26 19:12:26,973 INFO torchdistill.misc.log Epoch: [0] [ 0/24544] eta: 2:31:32 lr: 1.999972837896567e-05 sample/s: 12.123115565180507 loss: 1.1665 (1.1665) time: 0.3704 data: 0.0405 max mem: 5355
2021-05-26 19:15:03,514 INFO torchdistill.misc.log Epoch: [0] [ 1000/24544] eta: 1:01:30 lr: 1.972810734463277e-05 sample/s: 26.643611936032016 loss: 0.4966 (0.7978) time: 0.1565 data: 0.0028 max mem: 9034
2021-05-26 19:17:40,135 INFO torchdistill.misc.log Epoch: [0] [ 2000/24544] eta: 0:58:52 lr: 1.9456486310299872e-05 sample/s: 27.679465456795217 loss: 0.5308 (0.6753) time: 0.1576 data: 0.0028 max mem: 9034
2021-05-26 19:20:17,766 INFO torchdistill.misc.log Epoch: [0] [ 3000/24544] eta: 0:56:22 lr: 1.9184865275966974e-05 sample/s: 21.87914915370501 loss: 0.4498 (0.6280) time: 0.1605 data: 0.0028 max mem: 9034
2021-05-26 19:22:54,312 INFO torchdistill.misc.log Epoch: [0] [ 4000/24544] eta: 0:53:43 lr: 1.8913244241634076e-05 sample/s: 24.011839028146316 loss: 0.3829 (0.5939) time: 0.1587 data: 0.0027 max mem: 9034
2021-05-26 19:25:31,397 INFO torchdistill.misc.log Epoch: [0] [ 5000/24544] eta: 0:51:06 lr: 1.8641623207301177e-05 sample/s: 24.881084864807146 loss: 0.3964 (0.5686) time: 0.1602 data: 0.0027 max mem: 9034
2021-05-26 19:28:08,506 INFO torchdistill.misc.log Epoch: [0] [ 6000/24544] eta: 0:48:30 lr: 1.8370002172968276e-05 sample/s: 27.22999367016701 loss: 0.3826 (0.5527) time: 0.1524 data: 0.0028 max mem: 9034
2021-05-26 19:30:45,858 INFO torchdistill.misc.log Epoch: [0] [ 7000/24544] eta: 0:45:54 lr: 1.8098381138635377e-05 sample/s: 24.056011918161573 loss: 0.4032 (0.5388) time: 0.1597 data: 0.0028 max mem: 9034
2021-05-26 19:33:22,940 INFO torchdistill.misc.log Epoch: [0] [ 8000/24544] eta: 0:43:17 lr: 1.782676010430248e-05 sample/s: 26.72731385749652 loss: 0.4228 (0.5270) time: 0.1555 data: 0.0027 max mem: 9034
2021-05-26 19:36:00,181 INFO torchdistill.misc.log Epoch: [0] [ 9000/24544] eta: 0:40:41 lr: 1.755513906996958e-05 sample/s: 27.097306455494415 loss: 0.4902 (0.5164) time: 0.1585 data: 0.0026 max mem: 9034
2021-05-26 19:38:37,048 INFO torchdistill.misc.log Epoch: [0] [10000/24544] eta: 0:38:03 lr: 1.7283518035636683e-05 sample/s: 24.899585185405055 loss: 0.3073 (0.5080) time: 0.1533 data: 0.0026 max mem: 9034
2021-05-26 19:41:13,402 INFO torchdistill.misc.log Epoch: [0] [11000/24544] eta: 0:35:25 lr: 1.7011897001303784e-05 sample/s: 26.376325913302132 loss: 0.3966 (0.5002) time: 0.1545 data: 0.0026 max mem: 9034
2021-05-26 19:43:50,316 INFO torchdistill.misc.log Epoch: [0] [12000/24544] eta: 0:32:48 lr: 1.6740275966970883e-05 sample/s: 27.519061506615184 loss: 0.4471 (0.4942) time: 0.1535 data: 0.0026 max mem: 9034
2021-05-26 19:46:27,366 INFO torchdistill.misc.log Epoch: [0] [13000/24544] eta: 0:30:12 lr: 1.6468654932637984e-05 sample/s: 27.152036174196756 loss: 0.3334 (0.4879) time: 0.1564 data: 0.0027 max mem: 9034
2021-05-26 19:49:04,444 INFO torchdistill.misc.log Epoch: [0] [14000/24544] eta: 0:27:35 lr: 1.6197033898305086e-05 sample/s: 26.7613347795572 loss: 0.3891 (0.4822) time: 0.1609 data: 0.0029 max mem: 9034
2021-05-26 19:51:41,450 INFO torchdistill.misc.log Epoch: [0] [15000/24544] eta: 0:24:58 lr: 1.5925412863972188e-05 sample/s: 28.193590019359004 loss: 0.4292 (0.4766) time: 0.1557 data: 0.0026 max mem: 9034
2021-05-26 19:54:18,472 INFO torchdistill.misc.log Epoch: [0] [16000/24544] eta: 0:22:21 lr: 1.565379182963929e-05 sample/s: 25.112961216638976 loss: 0.3714 (0.4715) time: 0.1548 data: 0.0027 max mem: 9034
2021-05-26 19:56:55,577 INFO torchdistill.misc.log Epoch: [0] [17000/24544] eta: 0:19:44 lr: 1.538217079530639e-05 sample/s: 26.46332060948295 loss: 0.3461 (0.4665) time: 0.1560 data: 0.0027 max mem: 9034
2021-05-26 19:59:31,974 INFO torchdistill.misc.log Epoch: [0] [18000/24544] eta: 0:17:07 lr: 1.5110549760973491e-05 sample/s: 27.044453435683003 loss: 0.2556 (0.4616) time: 0.1563 data: 0.0028 max mem: 9034
2021-05-26 20:02:08,374 INFO torchdistill.misc.log Epoch: [0] [19000/24544] eta: 0:14:29 lr: 1.4838928726640591e-05 sample/s: 26.93417370769164 loss: 0.3405 (0.4576) time: 0.1580 data: 0.0026 max mem: 9034
2021-05-26 20:04:45,080 INFO torchdistill.misc.log Epoch: [0] [20000/24544] eta: 0:11:53 lr: 1.4567307692307693e-05 sample/s: 27.49177977363858 loss: 0.3791 (0.4541) time: 0.1547 data: 0.0027 max mem: 9034
2021-05-26 20:07:22,339 INFO torchdistill.misc.log Epoch: [0] [21000/24544] eta: 0:09:16 lr: 1.4295686657974795e-05 sample/s: 28.231116256451074 loss: 0.3786 (0.4511) time: 0.1622 data: 0.0028 max mem: 9034
2021-05-26 20:09:58,871 INFO torchdistill.misc.log Epoch: [0] [22000/24544] eta: 0:06:39 lr: 1.4024065623641896e-05 sample/s: 27.339283357260534 loss: 0.4165 (0.4477) time: 0.1622 data: 0.0028 max mem: 9034
2021-05-26 20:12:36,039 INFO torchdistill.misc.log Epoch: [0] [23000/24544] eta: 0:04:02 lr: 1.3752444589308998e-05 sample/s: 22.758457160065436 loss: 0.3802 (0.4446) time: 0.1555 data: 0.0027 max mem: 9034
2021-05-26 20:15:12,598 INFO torchdistill.misc.log Epoch: [0] [24000/24544] eta: 0:01:25 lr: 1.34808235549761e-05 sample/s: 28.02765809938272 loss: 0.3125 (0.4414) time: 0.1585 data: 0.0027 max mem: 9034
2021-05-26 20:16:38,263 INFO torchdistill.misc.log Epoch: [0] Total time: 1:04:11
2021-05-26 20:16:58,591 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 20:16:58,592 INFO __main__ Validation: accuracy = 0.8665308201732043
2021-05-26 20:16:58,592 INFO __main__ Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-large-uncased
2021-05-26 20:17:03,627 INFO torchdistill.misc.log Epoch: [1] [ 0/24544] eta: 1:33:58 lr: 1.3333061712299002e-05 sample/s: 19.747310188972392 loss: 0.4682 (0.4682) time: 0.2297 data: 0.0271 max mem: 9034
2021-05-26 20:19:40,585 INFO torchdistill.misc.log Epoch: [1] [ 1000/24544] eta: 1:01:37 lr: 1.3061440677966102e-05 sample/s: 26.252342839259867 loss: 0.1850 (0.2208) time: 0.1565 data: 0.0027 max mem: 9034
2021-05-26 20:22:17,591 INFO torchdistill.misc.log Epoch: [1] [ 2000/24544] eta: 0:58:59 lr: 1.2789819643633204e-05 sample/s: 21.8097504982106 loss: 0.2099 (0.2356) time: 0.1602 data: 0.0027 max mem: 9034
2021-05-26 20:24:54,769 INFO torchdistill.misc.log Epoch: [1] [ 3000/24544] eta: 0:56:23 lr: 1.2518198609300306e-05 sample/s: 27.91364301876747 loss: 0.2192 (0.2369) time: 0.1556 data: 0.0027 max mem: 9034
2021-05-26 20:27:32,375 INFO torchdistill.misc.log Epoch: [1] [ 4000/24544] eta: 0:53:49 lr: 1.2246577574967407e-05 sample/s: 27.884924849457917 loss: 0.1388 (0.2382) time: 0.1618 data: 0.0028 max mem: 9034
2021-05-26 20:30:08,985 INFO torchdistill.misc.log Epoch: [1] [ 5000/24544] eta: 0:51:10 lr: 1.1974956540634507e-05 sample/s: 25.346978857897394 loss: 0.2185 (0.2378) time: 0.1555 data: 0.0027 max mem: 9034
2021-05-26 20:32:46,374 INFO torchdistill.misc.log Epoch: [1] [ 6000/24544] eta: 0:48:33 lr: 1.1703335506301609e-05 sample/s: 27.4796219041395 loss: 0.2571 (0.2396) time: 0.1560 data: 0.0027 max mem: 9034
2021-05-26 20:35:22,973 INFO torchdistill.misc.log Epoch: [1] [ 7000/24544] eta: 0:45:55 lr: 1.143171447196871e-05 sample/s: 28.076955002476804 loss: 0.1621 (0.2406) time: 0.1579 data: 0.0027 max mem: 9034
2021-05-26 20:38:00,044 INFO torchdistill.misc.log Epoch: [1] [ 8000/24544] eta: 0:43:18 lr: 1.116009343763581e-05 sample/s: 25.54037357852913 loss: 0.1671 (0.2414) time: 0.1615 data: 0.0027 max mem: 9034
2021-05-26 20:40:36,918 INFO torchdistill.misc.log Epoch: [1] [ 9000/24544] eta: 0:40:40 lr: 1.0888472403302913e-05 sample/s: 24.698818735351963 loss: 0.2244 (0.2411) time: 0.1594 data: 0.0027 max mem: 9034
2021-05-26 20:43:13,900 INFO torchdistill.misc.log Epoch: [1] [10000/24544] eta: 0:38:03 lr: 1.0616851368970014e-05 sample/s: 24.829091856245597 loss: 0.1934 (0.2437) time: 0.1562 data: 0.0027 max mem: 9034
2021-05-26 20:45:50,810 INFO torchdistill.misc.log Epoch: [1] [11000/24544] eta: 0:35:26 lr: 1.0345230334637116e-05 sample/s: 26.688114023924662 loss: 0.2520 (0.2445) time: 0.1587 data: 0.0028 max mem: 9034
2021-05-26 20:48:27,672 INFO torchdistill.misc.log Epoch: [1] [12000/24544] eta: 0:32:49 lr: 1.0073609300304216e-05 sample/s: 27.18719170312753 loss: 0.1216 (0.2450) time: 0.1556 data: 0.0026 max mem: 9034
2021-05-26 20:51:04,476 INFO torchdistill.misc.log Epoch: [1] [13000/24544] eta: 0:30:12 lr: 9.801988265971318e-06 sample/s: 28.178247754435702 loss: 0.2546 (0.2451) time: 0.1543 data: 0.0026 max mem: 9034
2021-05-26 20:53:41,206 INFO torchdistill.misc.log Epoch: [1] [14000/24544] eta: 0:27:35 lr: 9.53036723163842e-06 sample/s: 21.816443589673572 loss: 0.2472 (0.2448) time: 0.1608 data: 0.0028 max mem: 9034
2021-05-26 20:56:18,004 INFO torchdistill.misc.log Epoch: [1] [15000/24544] eta: 0:24:58 lr: 9.258746197305521e-06 sample/s: 27.04789787804823 loss: 0.1441 (0.2442) time: 0.1567 data: 0.0027 max mem: 9034
2021-05-26 20:58:54,796 INFO torchdistill.misc.log Epoch: [1] [16000/24544] eta: 0:22:20 lr: 8.987125162972621e-06 sample/s: 25.65468688595532 loss: 0.2517 (0.2445) time: 0.1564 data: 0.0027 max mem: 9034
2021-05-26 21:01:32,093 INFO torchdistill.misc.log Epoch: [1] [17000/24544] eta: 0:19:44 lr: 8.715504128639723e-06 sample/s: 22.732769752540246 loss: 0.3479 (0.2443) time: 0.1581 data: 0.0027 max mem: 9034
2021-05-26 21:04:08,873 INFO torchdistill.misc.log Epoch: [1] [18000/24544] eta: 0:17:07 lr: 8.443883094306825e-06 sample/s: 26.943862990686924 loss: 0.2213 (0.2436) time: 0.1575 data: 0.0028 max mem: 9034
2021-05-26 21:06:45,412 INFO torchdistill.misc.log Epoch: [1] [19000/24544] eta: 0:14:30 lr: 8.172262059973926e-06 sample/s: 26.424266478925592 loss: 0.1826 (0.2434) time: 0.1576 data: 0.0027 max mem: 9034
2021-05-26 21:09:22,506 INFO torchdistill.misc.log Epoch: [1] [20000/24544] eta: 0:11:53 lr: 7.900641025641026e-06 sample/s: 25.074415219317977 loss: 0.2069 (0.2432) time: 0.1558 data: 0.0027 max mem: 9034
2021-05-26 21:11:58,927 INFO torchdistill.misc.log Epoch: [1] [21000/24544] eta: 0:09:16 lr: 7.629019991308127e-06 sample/s: 27.590701804876044 loss: 0.2668 (0.2434) time: 0.1545 data: 0.0027 max mem: 9034
2021-05-26 21:14:35,504 INFO torchdistill.misc.log Epoch: [1] [22000/24544] eta: 0:06:39 lr: 7.357398956975229e-06 sample/s: 27.259238076978693 loss: 0.1533 (0.2432) time: 0.1531 data: 0.0027 max mem: 9034
2021-05-26 21:17:13,344 INFO torchdistill.misc.log Epoch: [1] [23000/24544] eta: 0:04:02 lr: 7.08577792264233e-06 sample/s: 28.61345976737047 loss: 0.1621 (0.2431) time: 0.1588 data: 0.0026 max mem: 9034
2021-05-26 21:19:50,219 INFO torchdistill.misc.log Epoch: [1] [24000/24544] eta: 0:01:25 lr: 6.8141568883094315e-06 sample/s: 24.86183830306838 loss: 0.2084 (0.2426) time: 0.1577 data: 0.0027 max mem: 9034
2021-05-26 21:21:15,455 INFO torchdistill.misc.log Epoch: [1] Total time: 1:04:12
2021-05-26 21:21:35,766 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 21:21:35,767 INFO __main__ Validation: accuracy = 0.866225165562914
2021-05-26 21:21:35,944 INFO torchdistill.misc.log Epoch: [2] [ 0/24544] eta: 1:11:55 lr: 6.666395045632335e-06 sample/s: 26.641665713886454 loss: 0.1711 (0.1711) time: 0.1758 data: 0.0257 max mem: 9034
2021-05-26 21:24:12,602 INFO torchdistill.misc.log Epoch: [2] [ 1000/24544] eta: 1:01:28 lr: 6.394774011299436e-06 sample/s: 25.478160772915167 loss: 0.0000 (0.2154) time: 0.1594 data: 0.0027 max mem: 9034
2021-05-26 21:26:48,122 INFO torchdistill.misc.log Epoch: [2] [ 2000/24544] eta: 0:58:39 lr: 6.123152976966536e-06 sample/s: 29.191880596183893 loss: 0.0000 (0.2594) time: 0.1560 data: 0.0026 max mem: 9034
2021-05-26 21:29:23,722 INFO torchdistill.misc.log Epoch: [2] [ 3000/24544] eta: 0:55:59 lr: 5.851531942633638e-06 sample/s: 23.31094851566039 loss: 0.0000 (0.2732) time: 0.1529 data: 0.0027 max mem: 9034
2021-05-26 21:31:59,493 INFO torchdistill.misc.log Epoch: [2] [ 4000/24544] eta: 0:53:22 lr: 5.57991090830074e-06 sample/s: 28.10545818828756 loss: 0.0268 (0.2814) time: 0.1574 data: 0.0027 max mem: 9034
2021-05-26 21:34:34,884 INFO torchdistill.misc.log Epoch: [2] [ 5000/24544] eta: 0:50:44 lr: 5.30828987396784e-06 sample/s: 26.16505434282533 loss: 0.0000 (0.2821) time: 0.1500 data: 0.0027 max mem: 9034
2021-05-26 21:37:11,222 INFO torchdistill.misc.log Epoch: [2] [ 6000/24544] eta: 0:48:10 lr: 5.0366688396349415e-06 sample/s: 24.967990130203333 loss: 0.0000 (0.2841) time: 0.1545 data: 0.0027 max mem: 9034
2021-05-26 21:39:46,971 INFO torchdistill.misc.log Epoch: [2] [ 7000/24544] eta: 0:45:34 lr: 4.765047805302043e-06 sample/s: 24.54743723489709 loss: 0.0181 (0.2855) time: 0.1525 data: 0.0027 max mem: 9034
2021-05-26 21:42:22,158 INFO torchdistill.misc.log Epoch: [2] [ 8000/24544] eta: 0:42:57 lr: 4.493426770969144e-06 sample/s: 22.231136175716074 loss: 0.0000 (0.2868) time: 0.1544 data: 0.0027 max mem: 9034
2021-05-26 21:44:57,381 INFO torchdistill.misc.log Epoch: [2] [ 9000/24544] eta: 0:40:20 lr: 4.221805736636245e-06 sample/s: 26.316582354924982 loss: 0.0000 (0.2887) time: 0.1524 data: 0.0026 max mem: 9034
2021-05-26 21:47:33,414 INFO torchdistill.misc.log Epoch: [2] [10000/24544] eta: 0:37:45 lr: 3.950184702303347e-06 sample/s: 23.86255843954503 loss: 0.0003 (0.2882) time: 0.1609 data: 0.0028 max mem: 9034
2021-05-26 21:50:08,383 INFO torchdistill.misc.log Epoch: [2] [11000/24544] eta: 0:35:08 lr: 3.678563667970448e-06 sample/s: 26.68255374373182 loss: 0.3676 (0.2855) time: 0.1545 data: 0.0027 max mem: 9034
2021-05-26 21:52:43,285 INFO torchdistill.misc.log Epoch: [2] [12000/24544] eta: 0:32:31 lr: 3.4069426336375493e-06 sample/s: 25.459447811611312 loss: 0.0000 (0.2846) time: 0.1537 data: 0.0027 max mem: 9034
2021-05-26 21:55:18,524 INFO torchdistill.misc.log Epoch: [2] [13000/24544] eta: 0:29:56 lr: 3.1353215993046506e-06 sample/s: 24.929554150538273 loss: 0.0000 (0.2846) time: 0.1595 data: 0.0027 max mem: 9034
2021-05-26 21:57:55,270 INFO torchdistill.misc.log Epoch: [2] [14000/24544] eta: 0:27:21 lr: 2.8637005649717515e-06 sample/s: 27.516714503616473 loss: 0.1809 (0.2848) time: 0.1594 data: 0.0028 max mem: 9034
2021-05-26 22:00:31,919 INFO torchdistill.misc.log Epoch: [2] [15000/24544] eta: 0:24:46 lr: 2.5920795306388528e-06 sample/s: 28.056718090221164 loss: 0.0959 (0.2857) time: 0.1554 data: 0.0027 max mem: 9034
2021-05-26 22:03:07,727 INFO torchdistill.misc.log Epoch: [2] [16000/24544] eta: 0:22:10 lr: 2.320458496305954e-06 sample/s: 25.48872795003502 loss: 0.0000 (0.2860) time: 0.1578 data: 0.0028 max mem: 9034
2021-05-26 22:05:43,590 INFO torchdistill.misc.log Epoch: [2] [17000/24544] eta: 0:19:34 lr: 2.0488374619730554e-06 sample/s: 24.52734053486662 loss: 0.0000 (0.2869) time: 0.1545 data: 0.0027 max mem: 9034
2021-05-26 22:08:19,209 INFO torchdistill.misc.log Epoch: [2] [18000/24544] eta: 0:16:59 lr: 1.7772164276401565e-06 sample/s: 27.08741905120178 loss: 0.0000 (0.2869) time: 0.1550 data: 0.0027 max mem: 9034
2021-05-26 22:10:53,843 INFO torchdistill.misc.log Epoch: [2] [19000/24544] eta: 0:14:23 lr: 1.5055953933072578e-06 sample/s: 28.067091144517644 loss: 0.0000 (0.2887) time: 0.1572 data: 0.0028 max mem: 9034
2021-05-26 22:13:29,898 INFO torchdistill.misc.log Epoch: [2] [20000/24544] eta: 0:11:47 lr: 1.233974358974359e-06 sample/s: 25.48025028856084 loss: 0.0000 (0.2887) time: 0.1577 data: 0.0027 max mem: 9034
2021-05-26 22:16:06,871 INFO torchdistill.misc.log Epoch: [2] [21000/24544] eta: 0:09:12 lr: 9.623533246414604e-07 sample/s: 27.475571670244978 loss: 0.0000 (0.2893) time: 0.1557 data: 0.0029 max mem: 9034
2021-05-26 22:18:43,970 INFO torchdistill.misc.log Epoch: [2] [22000/24544] eta: 0:06:36 lr: 6.907322903085615e-07 sample/s: 25.952121373370375 loss: 0.0000 (0.2901) time: 0.1565 data: 0.0028 max mem: 9034
2021-05-26 22:21:20,398 INFO torchdistill.misc.log Epoch: [2] [23000/24544] eta: 0:04:00 lr: 4.191112559756628e-07 sample/s: 28.05516983050394 loss: 0.0000 (0.2902) time: 0.1573 data: 0.0028 max mem: 9034
2021-05-26 22:23:56,531 INFO torchdistill.misc.log Epoch: [2] [24000/24544] eta: 0:01:24 lr: 1.4749022164276403e-07 sample/s: 25.47750303335859 loss: 0.0000 (0.2901) time: 0.1553 data: 0.0027 max mem: 9034
2021-05-26 22:25:21,372 INFO torchdistill.misc.log Epoch: [2] Total time: 1:03:45
2021-05-26 22:25:41,707 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 22:25:41,708 INFO __main__ Validation: accuracy = 0.8601120733571065
2021-05-26 22:25:51,217 INFO __main__ [Student: bert-large-uncased]
2021-05-26 22:26:11,586 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-26 22:26:11,587 INFO __main__ Test: accuracy = 0.8665308201732043
2021-05-26 22:26:11,587 INFO __main__ Start prediction for private dataset(s)
2021-05-26 22:26:11,589 INFO __main__ mnli/test_m: 9796 samples
2021-05-26 22:26:31,714 INFO __main__ mnli/test_mm: 9847 samples
2021-05-26 22:26:51,852 INFO __main__ ax/test_ax: 1104 samples
|