File size: 14,625 Bytes
eca7edf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
2021-05-31 19:12:19,502	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.yaml', log='log/glue/mnli/kd/bert_base_uncased_from_bert_large_uncased.txt', private_output='leaderboard/glue/kd/bert_base_uncased_from_bert_large_uncased/', seed=None, student_only=False, task_name='mnli', test_only=False, world_size=1)
2021-05-31 19:12:19,563	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-31 19:12:19,941	INFO	filelock	Lock 140082792337040 acquired on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock
2021-05-31 19:12:20,295	INFO	filelock	Lock 140082792337040 released on /root/.cache/huggingface/transformers/5b5f978453cf40beb680cdd3d4aa881c966097f83937fbf475e0ed640062dbca.c73d14e62466b28d4e1ef822a490987b8f83b052127d2564f2e5bbce495e3c09.lock
2021-05-31 19:12:21,006	INFO	filelock	Lock 140082831894224 acquired on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-31 19:12:21,516	INFO	filelock	Lock 140082831894224 released on /root/.cache/huggingface/transformers/7a67abdbf71b85cb08398b0be2f83bb90b20e212c99600e63836e4a37df7de29.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-05-31 19:12:21,871	INFO	filelock	Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock
2021-05-31 19:12:22,393	INFO	filelock	Lock 140082823814352 released on /root/.cache/huggingface/transformers/696f700b8d350ef06d6b7bb1d40f1727616b761551d519a1b9e473493d622f2d.6dc9f54d5893dc361ac6ccee1865622847ad90bf0536eeb2043f3e3e2f41078a.lock
2021-05-31 19:12:23,095	INFO	filelock	Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock
2021-05-31 19:12:23,448	INFO	filelock	Lock 140082823814352 released on /root/.cache/huggingface/transformers/0a91d20dc356a0ee3b87e1e02495dfcdc9770ce1b64f4426459748fcdbca17e7.dd8bd9bfd3664b530ea4e645105f557769387b3da9f79bdb55ed556bdd80611d.lock
2021-05-31 19:12:23,803	INFO	filelock	Lock 140082823814352 acquired on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock
2021-05-31 19:12:24,158	INFO	filelock	Lock 140082823814352 released on /root/.cache/huggingface/transformers/f9a57124cc0406fe634d8934f74efb446b8d92423e8720867cec3ee4291518a6.0f95f2171d2c33a9e9e088c1e5decb2dfb3a22fb00d904f96183827da9540426.lock
2021-05-31 19:12:24,537	INFO	filelock	Lock 140082823814992 acquired on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock
2021-05-31 19:13:00,303	INFO	filelock	Lock 140082823814992 released on /root/.cache/huggingface/transformers/465d4939e3c54729c9bce27016baac778f168894b55701482c8ae4fa40953841.b487d9e34b8144fa22e4e1c7ea1213577af73f111e06c948c8cfa936dcc453aa.lock
2021-05-31 19:14:53,610	INFO	__main__	Start training
2021-05-31 19:14:53,610	INFO	torchdistill.models.util	[teacher model]
2021-05-31 19:14:53,610	INFO	torchdistill.models.util	Using the original teacher model
2021-05-31 19:14:53,610	INFO	torchdistill.models.util	[student model]
2021-05-31 19:14:53,611	INFO	torchdistill.models.util	Using the original student model
2021-05-31 19:14:53,611	INFO	torchdistill.core.distillation	Loss = 1.0 * OrgLoss
2021-05-31 19:14:53,611	INFO	torchdistill.core.distillation	Freezing the whole teacher model
2021-05-31 19:14:58,197	INFO	torchdistill.misc.log	Epoch: [0]  [    0/12272]  eta: 0:26:53  lr: 9.999728378965668e-05  sample/s: 38.52969437529281  loss: 0.0905 (0.0905)  time: 0.1315  data: 0.0277  max mem: 2519
2021-05-31 19:17:04,033	INFO	torchdistill.misc.log	Epoch: [0]  [ 1000/12272]  eta: 0:23:38  lr: 9.728107344632768e-05  sample/s: 25.678521357422294  loss: 0.0229 (0.0347)  time: 0.1315  data: 0.0046  max mem: 5109
2021-05-31 19:19:10,890	INFO	torchdistill.misc.log	Epoch: [0]  [ 2000/12272]  eta: 0:21:37  lr: 9.45648631029987e-05  sample/s: 33.98564182345601  loss: 0.0153 (0.0267)  time: 0.1355  data: 0.0044  max mem: 5109
2021-05-31 19:21:17,630	INFO	torchdistill.misc.log	Epoch: [0]  [ 3000/12272]  eta: 0:19:32  lr: 9.184865275966971e-05  sample/s: 30.293297895006734  loss: 0.0145 (0.0230)  time: 0.1215  data: 0.0044  max mem: 5109
2021-05-31 19:23:24,094	INFO	torchdistill.misc.log	Epoch: [0]  [ 4000/12272]  eta: 0:17:26  lr: 8.913244241634072e-05  sample/s: 39.35939116542367  loss: 0.0144 (0.0208)  time: 0.1229  data: 0.0045  max mem: 5109
2021-05-31 19:25:30,963	INFO	torchdistill.misc.log	Epoch: [0]  [ 5000/12272]  eta: 0:15:20  lr: 8.641623207301173e-05  sample/s: 31.891785647075373  loss: 0.0108 (0.0192)  time: 0.1368  data: 0.0047  max mem: 5109
2021-05-31 19:27:37,490	INFO	torchdistill.misc.log	Epoch: [0]  [ 6000/12272]  eta: 0:13:13  lr: 8.370002172968275e-05  sample/s: 30.313604538761055  loss: 0.0109 (0.0179)  time: 0.1267  data: 0.0047  max mem: 5109
2021-05-31 19:29:45,181	INFO	torchdistill.misc.log	Epoch: [0]  [ 7000/12272]  eta: 0:11:08  lr: 8.098381138635376e-05  sample/s: 42.336344641721595  loss: 0.0095 (0.0170)  time: 0.1268  data: 0.0045  max mem: 5109
2021-05-31 19:31:52,182	INFO	torchdistill.misc.log	Epoch: [0]  [ 8000/12272]  eta: 0:09:01  lr: 7.826760104302477e-05  sample/s: 31.78104944118204  loss: 0.0112 (0.0162)  time: 0.1264  data: 0.0046  max mem: 5109
2021-05-31 19:33:59,788	INFO	torchdistill.misc.log	Epoch: [0]  [ 9000/12272]  eta: 0:06:55  lr: 7.555139069969579e-05  sample/s: 30.615916348838482  loss: 0.0089 (0.0155)  time: 0.1314  data: 0.0045  max mem: 5109
2021-05-31 19:36:07,595	INFO	torchdistill.misc.log	Epoch: [0]  [10000/12272]  eta: 0:04:48  lr: 7.283518035636681e-05  sample/s: 37.10492838754766  loss: 0.0072 (0.0149)  time: 0.1298  data: 0.0048  max mem: 5109
2021-05-31 19:38:13,949	INFO	torchdistill.misc.log	Epoch: [0]  [11000/12272]  eta: 0:02:41  lr: 7.011897001303781e-05  sample/s: 32.78477658489305  loss: 0.0090 (0.0144)  time: 0.1288  data: 0.0045  max mem: 5109
2021-05-31 19:40:21,535	INFO	torchdistill.misc.log	Epoch: [0]  [12000/12272]  eta: 0:00:34  lr: 6.740275966970883e-05  sample/s: 37.34245014245014  loss: 0.0079 (0.0140)  time: 0.1317  data: 0.0050  max mem: 5109
2021-05-31 19:40:56,676	INFO	torchdistill.misc.log	Epoch: [0] Total time: 0:25:58
2021-05-31 19:41:04,501	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-31 19:41:04,501	INFO	__main__	Validation: accuracy = 0.8412633723892002
2021-05-31 19:41:04,501	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
2021-05-31 19:41:05,722	INFO	torchdistill.misc.log	Epoch: [1]  [    0/12272]  eta: 0:31:19  lr: 6.666395045632334e-05  sample/s: 31.762036742544716  loss: 0.0031 (0.0031)  time: 0.1532  data: 0.0272  max mem: 5109
2021-05-31 19:43:13,358	INFO	torchdistill.misc.log	Epoch: [1]  [ 1000/12272]  eta: 0:23:58  lr: 6.394774011299436e-05  sample/s: 37.225953324487556  loss: 0.0044 (0.0051)  time: 0.1300  data: 0.0046  max mem: 5109
2021-05-31 19:45:20,181	INFO	torchdistill.misc.log	Epoch: [1]  [ 2000/12272]  eta: 0:21:47  lr: 6.123152976966536e-05  sample/s: 37.15834562552879  loss: 0.0047 (0.0051)  time: 0.1284  data: 0.0045  max mem: 5109
2021-05-31 19:47:26,919	INFO	torchdistill.misc.log	Epoch: [1]  [ 3000/12272]  eta: 0:19:38  lr: 5.851531942633638e-05  sample/s: 39.3462836451305  loss: 0.0042 (0.0050)  time: 0.1197  data: 0.0043  max mem: 5109
2021-05-31 19:49:32,833	INFO	torchdistill.misc.log	Epoch: [1]  [ 4000/12272]  eta: 0:17:28  lr: 5.5799109083007396e-05  sample/s: 33.65857162059412  loss: 0.0040 (0.0050)  time: 0.1264  data: 0.0043  max mem: 5109
2021-05-31 19:51:40,796	INFO	torchdistill.misc.log	Epoch: [1]  [ 5000/12272]  eta: 0:15:23  lr: 5.30828987396784e-05  sample/s: 26.070806883959442  loss: 0.0046 (0.0050)  time: 0.1288  data: 0.0045  max mem: 5109
2021-05-31 19:53:48,528	INFO	torchdistill.misc.log	Epoch: [1]  [ 6000/12272]  eta: 0:13:17  lr: 5.036668839634942e-05  sample/s: 32.30201815220279  loss: 0.0045 (0.0049)  time: 0.1212  data: 0.0044  max mem: 5109
2021-05-31 19:55:53,950	INFO	torchdistill.misc.log	Epoch: [1]  [ 7000/12272]  eta: 0:11:08  lr: 4.765047805302043e-05  sample/s: 36.80166358838471  loss: 0.0038 (0.0049)  time: 0.1297  data: 0.0044  max mem: 5109
2021-05-31 19:57:59,848	INFO	torchdistill.misc.log	Epoch: [1]  [ 8000/12272]  eta: 0:09:01  lr: 4.493426770969144e-05  sample/s: 33.594812965184154  loss: 0.0041 (0.0048)  time: 0.1258  data: 0.0044  max mem: 5109
2021-05-31 20:00:06,135	INFO	torchdistill.misc.log	Epoch: [1]  [ 9000/12272]  eta: 0:06:54  lr: 4.221805736636245e-05  sample/s: 25.64719622596514  loss: 0.0045 (0.0048)  time: 0.1241  data: 0.0046  max mem: 5109
2021-05-31 20:02:14,011	INFO	torchdistill.misc.log	Epoch: [1]  [10000/12272]  eta: 0:04:48  lr: 3.9501847023033466e-05  sample/s: 33.08476073658346  loss: 0.0043 (0.0048)  time: 0.1239  data: 0.0047  max mem: 5109
2021-05-31 20:04:21,426	INFO	torchdistill.misc.log	Epoch: [1]  [11000/12272]  eta: 0:02:41  lr: 3.6785636679704476e-05  sample/s: 25.415942288056765  loss: 0.0039 (0.0047)  time: 0.1303  data: 0.0045  max mem: 5109
2021-05-31 20:06:28,294	INFO	torchdistill.misc.log	Epoch: [1]  [12000/12272]  eta: 0:00:34  lr: 3.406942633637549e-05  sample/s: 37.492242198062506  loss: 0.0038 (0.0047)  time: 0.1308  data: 0.0051  max mem: 5109
2021-05-31 20:07:02,616	INFO	torchdistill.misc.log	Epoch: [1] Total time: 0:25:57
2021-05-31 20:07:10,347	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-31 20:07:10,348	INFO	__main__	Validation: accuracy = 0.8530820173204279
2021-05-31 20:07:10,348	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
2021-05-31 20:07:11,549	INFO	torchdistill.misc.log	Epoch: [2]  [    0/12272]  eta: 0:27:31  lr: 3.3330617122990006e-05  sample/s: 36.21437806918137  loss: 0.0018 (0.0018)  time: 0.1346  data: 0.0241  max mem: 5109
2021-05-31 20:09:18,600	INFO	torchdistill.misc.log	Epoch: [2]  [ 1000/12272]  eta: 0:23:52  lr: 3.061440677966102e-05  sample/s: 37.12356586986009  loss: 0.0023 (0.0024)  time: 0.1341  data: 0.0045  max mem: 5109
2021-05-31 20:11:24,788	INFO	torchdistill.misc.log	Epoch: [2]  [ 2000/12272]  eta: 0:21:40  lr: 2.789819643633203e-05  sample/s: 32.69698623302515  loss: 0.0021 (0.0023)  time: 0.1271  data: 0.0046  max mem: 5109
2021-05-31 20:13:32,260	INFO	torchdistill.misc.log	Epoch: [2]  [ 3000/12272]  eta: 0:19:36  lr: 2.5181986093003048e-05  sample/s: 39.77255238497116  loss: 0.0019 (0.0023)  time: 0.1264  data: 0.0047  max mem: 5109
2021-05-31 20:15:38,928	INFO	torchdistill.misc.log	Epoch: [2]  [ 4000/12272]  eta: 0:17:29  lr: 2.2465775749674055e-05  sample/s: 37.40997928507879  loss: 0.0019 (0.0023)  time: 0.1170  data: 0.0045  max mem: 5109
2021-05-31 20:17:46,151	INFO	torchdistill.misc.log	Epoch: [2]  [ 5000/12272]  eta: 0:15:22  lr: 1.974956540634507e-05  sample/s: 39.26708624043028  loss: 0.0018 (0.0023)  time: 0.1322  data: 0.0045  max mem: 5109
2021-05-31 20:19:53,077	INFO	torchdistill.misc.log	Epoch: [2]  [ 6000/12272]  eta: 0:13:16  lr: 1.7033355063016082e-05  sample/s: 26.89458075644343  loss: 0.0019 (0.0022)  time: 0.1324  data: 0.0045  max mem: 5109
2021-05-31 20:21:59,132	INFO	torchdistill.misc.log	Epoch: [2]  [ 7000/12272]  eta: 0:11:08  lr: 1.4317144719687093e-05  sample/s: 32.304879269842495  loss: 0.0017 (0.0022)  time: 0.1225  data: 0.0044  max mem: 5109
2021-05-31 20:24:05,638	INFO	torchdistill.misc.log	Epoch: [2]  [ 8000/12272]  eta: 0:09:01  lr: 1.1600934376358105e-05  sample/s: 39.57945395824831  loss: 0.0021 (0.0022)  time: 0.1263  data: 0.0044  max mem: 5109
2021-05-31 20:26:11,594	INFO	torchdistill.misc.log	Epoch: [2]  [ 9000/12272]  eta: 0:06:54  lr: 8.884724033029119e-06  sample/s: 25.860594738237488  loss: 0.0023 (0.0022)  time: 0.1262  data: 0.0044  max mem: 5109
2021-05-31 20:28:18,549	INFO	torchdistill.misc.log	Epoch: [2]  [10000/12272]  eta: 0:04:47  lr: 6.168513689700131e-06  sample/s: 32.40314814635983  loss: 0.0019 (0.0022)  time: 0.1260  data: 0.0045  max mem: 5109
2021-05-31 20:30:24,951	INFO	torchdistill.misc.log	Epoch: [2]  [11000/12272]  eta: 0:02:41  lr: 3.452303346371143e-06  sample/s: 42.08254363214055  loss: 0.0021 (0.0022)  time: 0.1241  data: 0.0044  max mem: 5109
2021-05-31 20:32:31,971	INFO	torchdistill.misc.log	Epoch: [2]  [12000/12272]  eta: 0:00:34  lr: 7.360930030421556e-07  sample/s: 33.625651129091416  loss: 0.0019 (0.0022)  time: 0.1307  data: 0.0044  max mem: 5109
2021-05-31 20:33:06,083	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:25:54
2021-05-31 20:33:13,819	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-31 20:33:13,820	INFO	__main__	Validation: accuracy = 0.8582781456953642
2021-05-31 20:33:13,820	INFO	__main__	Updating ckpt at ./resource/ckpt/glue/mnli/kd/mnli-bert-base-uncased_from_bert-large-uncased
2021-05-31 20:33:15,094	INFO	__main__	[Teacher: bert-large-uncased]
2021-05-31 20:33:28,908	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-31 20:33:28,908	INFO	__main__	Test: accuracy = 0.8665308201732043
2021-05-31 20:33:32,568	INFO	__main__	[Student: bert-base-uncased]
2021-05-31 20:33:40,325	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/mnli/default_experiment-1-0.arrow
2021-05-31 20:33:40,326	INFO	__main__	Test: accuracy = 0.8582781456953642
2021-05-31 20:33:40,326	INFO	__main__	Start prediction for private dataset(s)
2021-05-31 20:33:40,327	INFO	__main__	mnli/test_m: 9796 samples
2021-05-31 20:33:47,980	INFO	__main__	mnli/test_mm: 9847 samples
2021-05-31 20:33:55,598	INFO	__main__	ax/test_ax: 1104 samples