File size: 21,058 Bytes
38c02eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
2021-05-29 15:29:04,310	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/ce/bert_base_uncased.yaml', log='log/glue/mnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021-05-29 15:29:04,374	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-29 15:29:04,728	INFO	filelock	Lock 139977050547728 acquired on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
2021-05-29 15:29:05,085	INFO	filelock	Lock 139977050547728 released on /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e.lock
2021-05-29 15:29:05,785	INFO	filelock	Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-29 15:29:06,321	INFO	filelock	Lock 139977045762832 released on /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-29 15:29:06,668	INFO	filelock	Lock 139977045762832 acquired on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-29 15:29:07,193	INFO	filelock	Lock 139977045762832 released on /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-05-29 15:29:08,239	INFO	filelock	Lock 139977012340816 acquired on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-29 15:29:08,584	INFO	filelock	Lock 139977012340816 released on /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-05-29 15:29:08,962	INFO	filelock	Lock 139977044338768 acquired on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
2021-05-29 15:29:16,242	INFO	filelock	Lock 139977044338768 released on /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f.lock
2021-05-29 15:30:57,737	INFO	__main__	Start training
2021-05-29 15:30:57,738	INFO	torchdistill.models.util	[student model]
2021-05-29 15:30:57,738	INFO	torchdistill.models.util	Using the original student model
2021-05-29 15:30:57,738	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-29 15:31:03,590	INFO	torchdistill.misc.log	Epoch: [0]  [    0/24544]  eta: 2:05:27  lr: 1.999972837896567e-05  sample/s: 14.823441888738587  loss: 1.1098 (1.1098)  time: 0.3067  data: 0.0369  max mem: 1893
2021-05-29 15:33:47,307	INFO	torchdistill.misc.log	Epoch: [0]  [ 1000/24544]  eta: 1:04:17  lr: 1.972810734463277e-05  sample/s: 27.213607114992513  loss: 0.7205 (0.9372)  time: 0.1595  data: 0.0022  max mem: 3188
2021-05-29 15:36:31,223	INFO	torchdistill.misc.log	Epoch: [0]  [ 2000/24544]  eta: 1:01:34  lr: 1.9456486310299872e-05  sample/s: 23.541286875308696  loss: 0.5824 (0.7945)  time: 0.1536  data: 0.0022  max mem: 3188
2021-05-29 15:39:15,330	INFO	torchdistill.misc.log	Epoch: [0]  [ 3000/24544]  eta: 0:58:52  lr: 1.9184865275966974e-05  sample/s: 23.52890622918265  loss: 0.5424 (0.7283)  time: 0.1764  data: 0.0023  max mem: 3188
2021-05-29 15:41:58,885	INFO	torchdistill.misc.log	Epoch: [0]  [ 4000/24544]  eta: 0:56:06  lr: 1.8913244241634076e-05  sample/s: 17.631738191448555  loss: 0.5007 (0.6846)  time: 0.1664  data: 0.0022  max mem: 3188
2021-05-29 15:44:43,385	INFO	torchdistill.misc.log	Epoch: [0]  [ 5000/24544]  eta: 0:53:24  lr: 1.8641623207301177e-05  sample/s: 23.523034105646886  loss: 0.4332 (0.6568)  time: 0.1620  data: 0.0022  max mem: 3188
2021-05-29 15:47:29,027	INFO	torchdistill.misc.log	Epoch: [0]  [ 6000/24544]  eta: 0:50:46  lr: 1.8370002172968276e-05  sample/s: 25.141637744489802  loss: 0.4319 (0.6330)  time: 0.1530  data: 0.0022  max mem: 3188
2021-05-29 15:50:15,261	INFO	torchdistill.misc.log	Epoch: [0]  [ 7000/24544]  eta: 0:48:06  lr: 1.8098381138635377e-05  sample/s: 27.210473391590924  loss: 0.4381 (0.6162)  time: 0.1500  data: 0.0023  max mem: 3188
2021-05-29 15:53:00,429	INFO	torchdistill.misc.log	Epoch: [0]  [ 8000/24544]  eta: 0:45:23  lr: 1.782676010430248e-05  sample/s: 27.11552516360098  loss: 0.3911 (0.6018)  time: 0.1653  data: 0.0023  max mem: 3188
2021-05-29 15:55:45,430	INFO	torchdistill.misc.log	Epoch: [0]  [ 9000/24544]  eta: 0:42:39  lr: 1.755513906996958e-05  sample/s: 27.143426172352147  loss: 0.4399 (0.5899)  time: 0.1523  data: 0.0022  max mem: 3188
2021-05-29 15:58:29,704	INFO	torchdistill.misc.log	Epoch: [0]  [10000/24544]  eta: 0:39:54  lr: 1.7283518035636683e-05  sample/s: 23.528708244688683  loss: 0.5770 (0.5795)  time: 0.1712  data: 0.0023  max mem: 3188
2021-05-29 16:01:14,615	INFO	torchdistill.misc.log	Epoch: [0]  [11000/24544]  eta: 0:37:10  lr: 1.7011897001303784e-05  sample/s: 23.51004667756884  loss: 0.4501 (0.5715)  time: 0.1655  data: 0.0024  max mem: 3188
2021-05-29 16:03:59,094	INFO	torchdistill.misc.log	Epoch: [0]  [12000/24544]  eta: 0:34:25  lr: 1.6740275966970883e-05  sample/s: 27.174949544687372  loss: 0.4899 (0.5636)  time: 0.1644  data: 0.0022  max mem: 3188
2021-05-29 16:06:43,208	INFO	torchdistill.misc.log	Epoch: [0]  [13000/24544]  eta: 0:31:40  lr: 1.6468654932637984e-05  sample/s: 25.122738503466554  loss: 0.4647 (0.5556)  time: 0.1629  data: 0.0022  max mem: 3188
2021-05-29 16:09:28,363	INFO	torchdistill.misc.log	Epoch: [0]  [14000/24544]  eta: 0:28:55  lr: 1.6197033898305086e-05  sample/s: 29.783168119976143  loss: 0.4150 (0.5499)  time: 0.1592  data: 0.0022  max mem: 3188
2021-05-29 16:12:13,821	INFO	torchdistill.misc.log	Epoch: [0]  [15000/24544]  eta: 0:26:11  lr: 1.5925412863972188e-05  sample/s: 25.158528026872208  loss: 0.4610 (0.5446)  time: 0.1599  data: 0.0024  max mem: 3188
2021-05-29 16:14:58,895	INFO	torchdistill.misc.log	Epoch: [0]  [16000/24544]  eta: 0:23:27  lr: 1.565379182963929e-05  sample/s: 20.605590470236685  loss: 0.4390 (0.5395)  time: 0.1720  data: 0.0024  max mem: 3188
2021-05-29 16:17:44,108	INFO	torchdistill.misc.log	Epoch: [0]  [17000/24544]  eta: 0:20:42  lr: 1.538217079530639e-05  sample/s: 23.374767502288403  loss: 0.5434 (0.5354)  time: 0.1773  data: 0.0023  max mem: 3188
2021-05-29 16:20:29,013	INFO	torchdistill.misc.log	Epoch: [0]  [18000/24544]  eta: 0:17:58  lr: 1.5110549760973491e-05  sample/s: 25.141336338406393  loss: 0.4033 (0.5311)  time: 0.1754  data: 0.0023  max mem: 3188
2021-05-29 16:23:12,810	INFO	torchdistill.misc.log	Epoch: [0]  [19000/24544]  eta: 0:15:13  lr: 1.4838928726640591e-05  sample/s: 27.220053378329048  loss: 0.3449 (0.5261)  time: 0.1665  data: 0.0022  max mem: 3188
2021-05-29 16:25:57,100	INFO	torchdistill.misc.log	Epoch: [0]  [20000/24544]  eta: 0:12:28  lr: 1.4567307692307693e-05  sample/s: 27.172836906771014  loss: 0.3282 (0.5225)  time: 0.1644  data: 0.0022  max mem: 3188
2021-05-29 16:28:40,766	INFO	torchdistill.misc.log	Epoch: [0]  [21000/24544]  eta: 0:09:43  lr: 1.4295686657974795e-05  sample/s: 29.776137866162625  loss: 0.4302 (0.5190)  time: 0.1636  data: 0.0023  max mem: 3188
2021-05-29 16:31:25,444	INFO	torchdistill.misc.log	Epoch: [0]  [22000/24544]  eta: 0:06:58  lr: 1.4024065623641896e-05  sample/s: 25.15667954698467  loss: 0.3418 (0.5156)  time: 0.1578  data: 0.0021  max mem: 3188
2021-05-29 16:34:10,137	INFO	torchdistill.misc.log	Epoch: [0]  [23000/24544]  eta: 0:04:14  lr: 1.3752444589308998e-05  sample/s: 32.650026272258444  loss: 0.3675 (0.5123)  time: 0.1637  data: 0.0022  max mem: 3188
2021-05-29 16:36:54,197	INFO	torchdistill.misc.log	Epoch: [0]  [24000/24544]  eta: 0:01:29  lr: 1.34808235549761e-05  sample/s: 25.185947572110116  loss: 0.4129 (0.5095)  time: 0.1622  data: 0.0023  max mem: 3188
2021-05-29 16:38:24,444	INFO	torchdistill.misc.log	Epoch: [0] Total time: 1:07:21
2021-05-29 16:38:55,360	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-29 16:38:55,361	INFO	__main__	Validation: accuracy = 0.8346408558329088
2021-05-29 16:38:55,361	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
2021-05-29 16:38:56,551	INFO	torchdistill.misc.log	Epoch: [1]  [    0/24544]  eta: 1:16:42  lr: 1.3333061712299002e-05  sample/s: 24.581569261785873  loss: 0.2703 (0.2703)  time: 0.1875  data: 0.0248  max mem: 3188
2021-05-29 16:41:41,363	INFO	torchdistill.misc.log	Epoch: [1]  [ 1000/24544]  eta: 1:04:40  lr: 1.3061440677966102e-05  sample/s: 25.180428648616196  loss: 0.2192 (0.3276)  time: 0.1543  data: 0.0022  max mem: 3188
2021-05-29 16:44:26,252	INFO	torchdistill.misc.log	Epoch: [1]  [ 2000/24544]  eta: 1:01:56  lr: 1.2789819643633204e-05  sample/s: 20.661468833319994  loss: 0.2764 (0.3302)  time: 0.1713  data: 0.0023  max mem: 3188
2021-05-29 16:47:11,576	INFO	torchdistill.misc.log	Epoch: [1]  [ 3000/24544]  eta: 0:59:15  lr: 1.2518198609300306e-05  sample/s: 20.64476498224354  loss: 0.3651 (0.3332)  time: 0.1719  data: 0.0022  max mem: 3188
2021-05-29 16:49:56,040	INFO	torchdistill.misc.log	Epoch: [1]  [ 4000/24544]  eta: 0:56:27  lr: 1.2246577574967407e-05  sample/s: 20.68329823903899  loss: 0.3764 (0.3347)  time: 0.1674  data: 0.0022  max mem: 3188
2021-05-29 16:52:40,153	INFO	torchdistill.misc.log	Epoch: [1]  [ 5000/24544]  eta: 0:53:39  lr: 1.1974956540634507e-05  sample/s: 25.14510443394643  loss: 0.3584 (0.3372)  time: 0.1650  data: 0.0023  max mem: 3188
2021-05-29 16:55:25,903	INFO	torchdistill.misc.log	Epoch: [1]  [ 6000/24544]  eta: 0:50:57  lr: 1.1703335506301609e-05  sample/s: 27.211797412017347  loss: 0.2562 (0.3376)  time: 0.1659  data: 0.0022  max mem: 3188
2021-05-29 16:58:10,023	INFO	torchdistill.misc.log	Epoch: [1]  [ 7000/24544]  eta: 0:48:10  lr: 1.143171447196871e-05  sample/s: 29.83841542037708  loss: 0.2930 (0.3370)  time: 0.1702  data: 0.0024  max mem: 3188
2021-05-29 17:00:54,079	INFO	torchdistill.misc.log	Epoch: [1]  [ 8000/24544]  eta: 0:45:24  lr: 1.116009343763581e-05  sample/s: 27.1218811935608  loss: 0.2333 (0.3370)  time: 0.1612  data: 0.0022  max mem: 3188
2021-05-29 17:03:39,537	INFO	torchdistill.misc.log	Epoch: [1]  [ 9000/24544]  eta: 0:42:41  lr: 1.0888472403302913e-05  sample/s: 20.654448399014626  loss: 0.2945 (0.3364)  time: 0.1665  data: 0.0022  max mem: 3188
2021-05-29 17:06:23,515	INFO	torchdistill.misc.log	Epoch: [1]  [10000/24544]  eta: 0:39:55  lr: 1.0616851368970014e-05  sample/s: 32.66127672663348  loss: 0.2922 (0.3369)  time: 0.1590  data: 0.0022  max mem: 3188
2021-05-29 17:09:08,492	INFO	torchdistill.misc.log	Epoch: [1]  [11000/24544]  eta: 0:37:10  lr: 1.0345230334637116e-05  sample/s: 25.208653502300407  loss: 0.2785 (0.3363)  time: 0.1561  data: 0.0021  max mem: 3188
2021-05-29 17:11:51,342	INFO	torchdistill.misc.log	Epoch: [1]  [12000/24544]  eta: 0:34:24  lr: 1.0073609300304216e-05  sample/s: 27.221422498555956  loss: 0.2517 (0.3361)  time: 0.1569  data: 0.0022  max mem: 3188
2021-05-29 17:14:35,232	INFO	torchdistill.misc.log	Epoch: [1]  [13000/24544]  eta: 0:31:39  lr: 9.801988265971318e-06  sample/s: 29.87231117940427  loss: 0.2914 (0.3361)  time: 0.1607  data: 0.0022  max mem: 3188
2021-05-29 17:17:19,532	INFO	torchdistill.misc.log	Epoch: [1]  [14000/24544]  eta: 0:28:54  lr: 9.53036723163842e-06  sample/s: 27.194418870028656  loss: 0.2757 (0.3361)  time: 0.1520  data: 0.0022  max mem: 3188
2021-05-29 17:20:04,510	INFO	torchdistill.misc.log	Epoch: [1]  [15000/24544]  eta: 0:26:10  lr: 9.258746197305521e-06  sample/s: 25.20952471108667  loss: 0.2290 (0.3357)  time: 0.1727  data: 0.0023  max mem: 3188
2021-05-29 17:22:48,546	INFO	torchdistill.misc.log	Epoch: [1]  [16000/24544]  eta: 0:23:25  lr: 8.987125162972621e-06  sample/s: 25.115630412620376  loss: 0.2612 (0.3352)  time: 0.1640  data: 0.0022  max mem: 3188
2021-05-29 17:25:32,713	INFO	torchdistill.misc.log	Epoch: [1]  [17000/24544]  eta: 0:20:40  lr: 8.715504128639723e-06  sample/s: 27.229242372355948  loss: 0.2846 (0.3349)  time: 0.1657  data: 0.0023  max mem: 3188
2021-05-29 17:28:16,077	INFO	torchdistill.misc.log	Epoch: [1]  [18000/24544]  eta: 0:17:55  lr: 8.443883094306825e-06  sample/s: 27.18763227406051  loss: 0.2757 (0.3350)  time: 0.1631  data: 0.0022  max mem: 3188
2021-05-29 17:31:01,304	INFO	torchdistill.misc.log	Epoch: [1]  [19000/24544]  eta: 0:15:11  lr: 8.172262059973926e-06  sample/s: 25.188254140311948  loss: 0.2174 (0.3342)  time: 0.1696  data: 0.0022  max mem: 3188
2021-05-29 17:33:45,948	INFO	torchdistill.misc.log	Epoch: [1]  [20000/24544]  eta: 0:12:27  lr: 7.900641025641026e-06  sample/s: 20.58210654418404  loss: 0.3514 (0.3342)  time: 0.1772  data: 0.0025  max mem: 3188
2021-05-29 17:36:30,052	INFO	torchdistill.misc.log	Epoch: [1]  [21000/24544]  eta: 0:09:42  lr: 7.629019991308127e-06  sample/s: 25.193322511521327  loss: 0.2484 (0.3337)  time: 0.1620  data: 0.0021  max mem: 3188
2021-05-29 17:39:12,159	INFO	torchdistill.misc.log	Epoch: [1]  [22000/24544]  eta: 0:06:58  lr: 7.357398956975229e-06  sample/s: 17.588355792435543  loss: 0.3947 (0.3335)  time: 0.1706  data: 0.0022  max mem: 3188
2021-05-29 17:41:56,330	INFO	torchdistill.misc.log	Epoch: [1]  [23000/24544]  eta: 0:04:13  lr: 7.08577792264233e-06  sample/s: 25.18526702614418  loss: 0.2381 (0.3328)  time: 0.1535  data: 0.0021  max mem: 3188
2021-05-29 17:44:38,298	INFO	torchdistill.misc.log	Epoch: [1]  [24000/24544]  eta: 0:01:29  lr: 6.8141568883094315e-06  sample/s: 23.51261665412359  loss: 0.2045 (0.3325)  time: 0.1640  data: 0.0022  max mem: 3188
2021-05-29 17:46:07,356	INFO	torchdistill.misc.log	Epoch: [1] Total time: 1:07:10
2021-05-29 17:46:38,288	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-29 17:46:38,288	INFO	__main__	Validation: accuracy = 0.8419765664798777
2021-05-29 17:46:38,288	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/ce/mnli-bert-base-uncased
2021-05-29 17:46:39,758	INFO	torchdistill.misc.log	Epoch: [2]  [    0/24544]  eta: 1:16:03  lr: 6.666395045632335e-06  sample/s: 25.03165433277632  loss: 0.1728 (0.1728)  time: 0.1859  data: 0.0261  max mem: 3188
2021-05-29 17:49:25,362	INFO	torchdistill.misc.log	Epoch: [2]  [ 1000/24544]  eta: 1:04:59  lr: 6.394774011299436e-06  sample/s: 23.490921310557265  loss: 0.1058 (0.2018)  time: 0.1615  data: 0.0023  max mem: 3188
2021-05-29 17:52:10,171	INFO	torchdistill.misc.log	Epoch: [2]  [ 2000/24544]  eta: 1:02:04  lr: 6.123152976966536e-06  sample/s: 25.18507799212498  loss: 0.1258 (0.2032)  time: 0.1576  data: 0.0024  max mem: 3188
2021-05-29 17:54:53,689	INFO	torchdistill.misc.log	Epoch: [2]  [ 3000/24544]  eta: 0:59:07  lr: 5.851531942633638e-06  sample/s: 22.06454226231966  loss: 0.1610 (0.2044)  time: 0.1685  data: 0.0022  max mem: 3188
2021-05-29 17:57:38,820	INFO	torchdistill.misc.log	Epoch: [2]  [ 4000/24544]  eta: 0:56:25  lr: 5.57991090830074e-06  sample/s: 25.261603369174225  loss: 0.1056 (0.2069)  time: 0.1639  data: 0.0025  max mem: 3188
2021-05-29 18:00:23,587	INFO	torchdistill.misc.log	Epoch: [2]  [ 5000/24544]  eta: 0:53:40  lr: 5.30828987396784e-06  sample/s: 29.636331197679574  loss: 0.0981 (0.2070)  time: 0.1733  data: 0.0025  max mem: 3188
2021-05-29 18:03:08,981	INFO	torchdistill.misc.log	Epoch: [2]  [ 6000/24544]  eta: 0:50:57  lr: 5.0366688396349415e-06  sample/s: 25.148458989068047  loss: 0.1072 (0.2070)  time: 0.1682  data: 0.0024  max mem: 3188
2021-05-29 18:05:55,250	INFO	torchdistill.misc.log	Epoch: [2]  [ 7000/24544]  eta: 0:48:16  lr: 4.765047805302043e-06  sample/s: 29.670923505994416  loss: 0.2394 (0.2083)  time: 0.1682  data: 0.0025  max mem: 3188
2021-05-29 18:08:39,572	INFO	torchdistill.misc.log	Epoch: [2]  [ 8000/24544]  eta: 0:45:29  lr: 4.493426770969144e-06  sample/s: 27.09389317291776  loss: 0.2440 (0.2084)  time: 0.1591  data: 0.0023  max mem: 3188
2021-05-29 18:11:24,579	INFO	torchdistill.misc.log	Epoch: [2]  [ 9000/24544]  eta: 0:42:44  lr: 4.221805736636245e-06  sample/s: 20.676721641200583  loss: 0.1225 (0.2089)  time: 0.1663  data: 0.0022  max mem: 3188
2021-05-29 18:14:09,996	INFO	torchdistill.misc.log	Epoch: [2]  [10000/24544]  eta: 0:40:00  lr: 3.950184702303347e-06  sample/s: 25.05105260250164  loss: 0.1564 (0.2103)  time: 0.1630  data: 0.0024  max mem: 3188
2021-05-29 18:16:54,735	INFO	torchdistill.misc.log	Epoch: [2]  [11000/24544]  eta: 0:37:14  lr: 3.678563667970448e-06  sample/s: 19.599001897143072  loss: 0.1674 (0.2102)  time: 0.1689  data: 0.0024  max mem: 3188
2021-05-29 18:19:39,315	INFO	torchdistill.misc.log	Epoch: [2]  [12000/24544]  eta: 0:34:29  lr: 3.4069426336375493e-06  sample/s: 17.593557046979864  loss: 0.1158 (0.2100)  time: 0.1817  data: 0.0025  max mem: 3188
2021-05-29 18:22:23,805	INFO	torchdistill.misc.log	Epoch: [2]  [13000/24544]  eta: 0:31:43  lr: 3.1353215993046506e-06  sample/s: 23.394160807859958  loss: 0.1052 (0.2098)  time: 0.1561  data: 0.0023  max mem: 3188
2021-05-29 18:25:09,919	INFO	torchdistill.misc.log	Epoch: [2]  [14000/24544]  eta: 0:28:59  lr: 2.8637005649717515e-06  sample/s: 17.603229962773167  loss: 0.2102 (0.2101)  time: 0.1706  data: 0.0024  max mem: 3188
2021-05-29 18:27:55,592	INFO	torchdistill.misc.log	Epoch: [2]  [15000/24544]  eta: 0:26:15  lr: 2.5920795306388528e-06  sample/s: 23.41979622233793  loss: 0.1816 (0.2104)  time: 0.1779  data: 0.0024  max mem: 3188
2021-05-29 18:30:40,619	INFO	torchdistill.misc.log	Epoch: [2]  [16000/24544]  eta: 0:23:30  lr: 2.320458496305954e-06  sample/s: 25.04933208115658  loss: 0.0908 (0.2108)  time: 0.1628  data: 0.0024  max mem: 3188
2021-05-29 18:33:26,626	INFO	torchdistill.misc.log	Epoch: [2]  [17000/24544]  eta: 0:20:45  lr: 2.0488374619730554e-06  sample/s: 17.581259752249633  loss: 0.1275 (0.2112)  time: 0.1736  data: 0.0025  max mem: 3188
2021-05-29 18:36:10,854	INFO	torchdistill.misc.log	Epoch: [2]  [18000/24544]  eta: 0:18:00  lr: 1.7772164276401565e-06  sample/s: 23.390051304929734  loss: 0.0697 (0.2113)  time: 0.1610  data: 0.0023  max mem: 3188
2021-05-29 18:38:57,109	INFO	torchdistill.misc.log	Epoch: [2]  [19000/24544]  eta: 0:15:15  lr: 1.5055953933072578e-06  sample/s: 27.079592996229533  loss: 0.1900 (0.2115)  time: 0.1657  data: 0.0025  max mem: 3188
2021-05-29 18:41:43,430	INFO	torchdistill.misc.log	Epoch: [2]  [20000/24544]  eta: 0:12:30  lr: 1.233974358974359e-06  sample/s: 25.050229714980222  loss: 0.1741 (0.2123)  time: 0.1638  data: 0.0025  max mem: 3188
2021-05-29 18:44:29,021	INFO	torchdistill.misc.log	Epoch: [2]  [21000/24544]  eta: 0:09:45  lr: 9.623533246414604e-07  sample/s: 27.045979346210515  loss: 0.2618 (0.2119)  time: 0.1560  data: 0.0024  max mem: 3188
2021-05-29 18:47:12,810	INFO	torchdistill.misc.log	Epoch: [2]  [22000/24544]  eta: 0:07:00  lr: 6.907322903085615e-07  sample/s: 29.739929200841647  loss: 0.0538 (0.2119)  time: 0.1495  data: 0.0024  max mem: 3188
2021-05-29 18:49:58,325	INFO	torchdistill.misc.log	Epoch: [2]  [23000/24544]  eta: 0:04:14  lr: 4.191112559756628e-07  sample/s: 25.124770125614745  loss: 0.0919 (0.2117)  time: 0.1682  data: 0.0023  max mem: 3188
2021-05-29 18:52:43,792	INFO	torchdistill.misc.log	Epoch: [2]  [24000/24544]  eta: 0:01:29  lr: 1.4749022164276403e-07  sample/s: 27.05199747171807  loss: 0.0465 (0.2116)  time: 0.1664  data: 0.0025  max mem: 3188
2021-05-29 18:54:13,503	INFO	torchdistill.misc.log	Epoch: [2] Total time: 1:07:33
2021-05-29 18:54:44,621	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-29 18:54:44,622	INFO	__main__	Validation: accuracy = 0.839429444727458
2021-05-29 18:54:48,420	INFO	__main__	[Student: bert-base-uncased]
2021-05-29 18:55:19,412	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-29 18:55:19,413	INFO	__main__	Test: accuracy = 0.8419765664798777
2021-05-29 18:55:19,413	INFO	__main__	Start prediction for private dataset(s)
2021-05-29 18:55:19,414	INFO	__main__	mnli/test_m: 9796 samples
2021-05-29 18:55:50,074	INFO	__main__	mnli/test_mm: 9847 samples
2021-05-29 18:56:20,968	INFO	__main__	ax/test_ax: 1104 samples