File size: 9,547 Bytes
0cdf304
 
14a227a
 
 
 
 
 
0cdf304
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
2021-05-25 22:10:47,882	INFO	__main__	Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/cola/ce/bert_large_uncased.yaml', log='log/glue/cola/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='cola', test_only=False, world_size=1)
2021-05-25 22:10:47,924	INFO	__main__	Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Use FP16 precision: True

2021-05-25 22:11:19,057	WARNING	datasets.builder	Reusing dataset glue (/root/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
2021-05-25 22:11:21,447	INFO	__main__	Start training
2021-05-25 22:11:21,448	INFO	torchdistill.models.util	[student model]
2021-05-25 22:11:21,448	INFO	torchdistill.models.util	Using the original student model
2021-05-25 22:11:21,448	INFO	torchdistill.core.training	Loss = 1.0 * OrgLoss
2021-05-25 22:11:26,160	INFO	torchdistill.misc.log	Epoch: [0]  [  0/535]  eta: 0:03:19  lr: 1.998753894080997e-05  sample/s: 11.577144663106461  loss: 1.2302 (1.2302)  time: 0.3729  data: 0.0273  max mem: 2567
2021-05-25 22:11:38,497	INFO	torchdistill.misc.log	Epoch: [0]  [ 50/535]  eta: 0:02:00  lr: 1.936448598130841e-05  sample/s: 18.961848487999386  loss: 0.5031 (0.6982)  time: 0.2536  data: 0.0022  max mem: 6363
2021-05-25 22:11:51,105	INFO	torchdistill.misc.log	Epoch: [0]  [100/535]  eta: 0:01:49  lr: 1.8741433021806853e-05  sample/s: 15.993304194887585  loss: 0.4494 (0.5958)  time: 0.2486  data: 0.0021  max mem: 6582
2021-05-25 22:12:03,585	INFO	torchdistill.misc.log	Epoch: [0]  [150/535]  eta: 0:01:36  lr: 1.8118380062305295e-05  sample/s: 18.00978787218443  loss: 0.4563 (0.5483)  time: 0.2487  data: 0.0022  max mem: 6582
2021-05-25 22:12:16,299	INFO	torchdistill.misc.log	Epoch: [0]  [200/535]  eta: 0:01:24  lr: 1.749532710280374e-05  sample/s: 14.060560956947262  loss: 0.4476 (0.5233)  time: 0.2532  data: 0.0024  max mem: 6582
2021-05-25 22:12:28,756	INFO	torchdistill.misc.log	Epoch: [0]  [250/535]  eta: 0:01:11  lr: 1.6872274143302183e-05  sample/s: 15.967990192999187  loss: 0.4905 (0.5099)  time: 0.2481  data: 0.0022  max mem: 6588
2021-05-25 22:12:41,353	INFO	torchdistill.misc.log	Epoch: [0]  [300/535]  eta: 0:00:58  lr: 1.6249221183800625e-05  sample/s: 15.929238896579488  loss: 0.3546 (0.4883)  time: 0.2483  data: 0.0024  max mem: 6588
2021-05-25 22:12:54,011	INFO	torchdistill.misc.log	Epoch: [0]  [350/535]  eta: 0:00:46  lr: 1.5626168224299067e-05  sample/s: 15.80400535051527  loss: 0.3522 (0.4753)  time: 0.2495  data: 0.0023  max mem: 6588
2021-05-25 22:13:06,660	INFO	torchdistill.misc.log	Epoch: [0]  [400/535]  eta: 0:00:33  lr: 1.500311526479751e-05  sample/s: 15.933353720267208  loss: 0.4134 (0.4657)  time: 0.2548  data: 0.0022  max mem: 6588
2021-05-25 22:13:19,462	INFO	torchdistill.misc.log	Epoch: [0]  [450/535]  eta: 0:00:21  lr: 1.4380062305295952e-05  sample/s: 14.186444877192478  loss: 0.3114 (0.4558)  time: 0.2487  data: 0.0022  max mem: 6789
2021-05-25 22:13:32,122	INFO	torchdistill.misc.log	Epoch: [0]  [500/535]  eta: 0:00:08  lr: 1.3757009345794394e-05  sample/s: 15.952928199910618  loss: 0.3921 (0.4485)  time: 0.2496  data: 0.0022  max mem: 6789
2021-05-25 22:13:40,793	INFO	torchdistill.misc.log	Epoch: [0] Total time: 0:02:15
2021-05-25 22:13:44,271	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:13:44,272	INFO	__main__	Validation: matthews_correlation = 0.5806473000395166
2021-05-25 22:13:44,272	INFO	__main__	Updating ckpt
2021-05-25 22:13:51,970	INFO	torchdistill.misc.log	Epoch: [1]  [  0/535]  eta: 0:02:25  lr: 1.3320872274143304e-05  sample/s: 14.850208406541903  loss: 0.1800 (0.1800)  time: 0.2726  data: 0.0032  max mem: 6789
2021-05-25 22:14:04,713	INFO	torchdistill.misc.log	Epoch: [1]  [ 50/535]  eta: 0:02:03  lr: 1.2697819314641746e-05  sample/s: 15.958785637372966  loss: 0.1996 (0.2190)  time: 0.2530  data: 0.0022  max mem: 6789
2021-05-25 22:14:17,311	INFO	torchdistill.misc.log	Epoch: [1]  [100/535]  eta: 0:01:50  lr: 1.2074766355140188e-05  sample/s: 15.915004705096683  loss: 0.2385 (0.2362)  time: 0.2554  data: 0.0024  max mem: 6789
2021-05-25 22:14:30,069	INFO	torchdistill.misc.log	Epoch: [1]  [150/535]  eta: 0:01:37  lr: 1.145171339563863e-05  sample/s: 15.867590506617637  loss: 0.1191 (0.2252)  time: 0.2559  data: 0.0023  max mem: 6789
2021-05-25 22:14:42,525	INFO	torchdistill.misc.log	Epoch: [1]  [200/535]  eta: 0:01:24  lr: 1.0828660436137072e-05  sample/s: 15.931447324440123  loss: 0.2200 (0.2283)  time: 0.2491  data: 0.0024  max mem: 6789
2021-05-25 22:14:55,329	INFO	torchdistill.misc.log	Epoch: [1]  [250/535]  eta: 0:01:12  lr: 1.0205607476635516e-05  sample/s: 17.869222027311004  loss: 0.2449 (0.2366)  time: 0.2565  data: 0.0022  max mem: 6789
2021-05-25 22:15:07,884	INFO	torchdistill.misc.log	Epoch: [1]  [300/535]  eta: 0:00:59  lr: 9.582554517133958e-06  sample/s: 18.815330197672253  loss: 0.0952 (0.2291)  time: 0.2519  data: 0.0023  max mem: 6789
2021-05-25 22:15:20,408	INFO	torchdistill.misc.log	Epoch: [1]  [350/535]  eta: 0:00:46  lr: 8.9595015576324e-06  sample/s: 18.9154784279283  loss: 0.1866 (0.2326)  time: 0.2561  data: 0.0024  max mem: 6789
2021-05-25 22:15:33,181	INFO	torchdistill.misc.log	Epoch: [1]  [400/535]  eta: 0:00:34  lr: 8.336448598130842e-06  sample/s: 15.924173926909392  loss: 0.2051 (0.2323)  time: 0.2555  data: 0.0022  max mem: 6793
2021-05-25 22:15:45,996	INFO	torchdistill.misc.log	Epoch: [1]  [450/535]  eta: 0:00:21  lr: 7.713395638629284e-06  sample/s: 15.914974510945969  loss: 0.1959 (0.2370)  time: 0.2510  data: 0.0023  max mem: 6793
2021-05-25 22:15:58,625	INFO	torchdistill.misc.log	Epoch: [1]  [500/535]  eta: 0:00:08  lr: 7.090342679127727e-06  sample/s: 15.935654106628926  loss: 0.2647 (0.2354)  time: 0.2522  data: 0.0023  max mem: 6793
2021-05-25 22:16:07,126	INFO	torchdistill.misc.log	Epoch: [1] Total time: 0:02:15
2021-05-25 22:16:10,577	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:16:10,578	INFO	__main__	Validation: matthews_correlation = 0.6043989222564181
2021-05-25 22:16:10,578	INFO	__main__	Updating ckpt
2021-05-25 22:16:17,171	INFO	torchdistill.misc.log	Epoch: [2]  [  0/535]  eta: 0:02:25  lr: 6.654205607476636e-06  sample/s: 14.956847949733799  loss: 0.0537 (0.0537)  time: 0.2712  data: 0.0037  max mem: 6793
2021-05-25 22:16:30,294	INFO	torchdistill.misc.log	Epoch: [2]  [ 50/535]  eta: 0:02:07  lr: 6.031152647975078e-06  sample/s: 14.163199286826208  loss: 0.0417 (0.1182)  time: 0.2616  data: 0.0023  max mem: 6793
2021-05-25 22:16:42,753	INFO	torchdistill.misc.log	Epoch: [2]  [100/535]  eta: 0:01:51  lr: 5.408099688473521e-06  sample/s: 18.72304600183244  loss: 0.0036 (0.1272)  time: 0.2478  data: 0.0023  max mem: 6793
2021-05-25 22:16:55,421	INFO	torchdistill.misc.log	Epoch: [2]  [150/535]  eta: 0:01:38  lr: 4.7850467289719636e-06  sample/s: 17.933821981022056  loss: 0.0642 (0.1666)  time: 0.2506  data: 0.0023  max mem: 6793
2021-05-25 22:17:08,064	INFO	torchdistill.misc.log	Epoch: [2]  [200/535]  eta: 0:01:25  lr: 4.1619937694704055e-06  sample/s: 15.78100828969634  loss: 0.0050 (0.1866)  time: 0.2558  data: 0.0022  max mem: 6793
2021-05-25 22:17:20,695	INFO	torchdistill.misc.log	Epoch: [2]  [250/535]  eta: 0:01:12  lr: 3.5389408099688475e-06  sample/s: 15.911321864152804  loss: 0.1082 (0.1935)  time: 0.2541  data: 0.0022  max mem: 6793
2021-05-25 22:17:33,424	INFO	torchdistill.misc.log	Epoch: [2]  [300/535]  eta: 0:00:59  lr: 2.91588785046729e-06  sample/s: 15.906162717276503  loss: 0.0000 (0.2150)  time: 0.2512  data: 0.0022  max mem: 6793
2021-05-25 22:17:45,850	INFO	torchdistill.misc.log	Epoch: [2]  [350/535]  eta: 0:00:46  lr: 2.2928348909657324e-06  sample/s: 15.864574616061812  loss: 0.0202 (0.2241)  time: 0.2504  data: 0.0023  max mem: 6793
2021-05-25 22:17:58,459	INFO	torchdistill.misc.log	Epoch: [2]  [400/535]  eta: 0:00:34  lr: 1.6697819314641748e-06  sample/s: 15.983202467804412  loss: 0.0000 (0.2294)  time: 0.2523  data: 0.0024  max mem: 6793
2021-05-25 22:18:11,040	INFO	torchdistill.misc.log	Epoch: [2]  [450/535]  eta: 0:00:21  lr: 1.046728971962617e-06  sample/s: 15.717480588781966  loss: 0.0068 (0.2289)  time: 0.2524  data: 0.0023  max mem: 6793
2021-05-25 22:18:23,489	INFO	torchdistill.misc.log	Epoch: [2]  [500/535]  eta: 0:00:08  lr: 4.2367601246105923e-07  sample/s: 14.20932160827428  loss: 0.0000 (0.2368)  time: 0.2523  data: 0.0023  max mem: 6793
2021-05-25 22:18:31,858	INFO	torchdistill.misc.log	Epoch: [2] Total time: 0:02:14
2021-05-25 22:18:35,313	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:18:35,314	INFO	__main__	Validation: matthews_correlation = 0.610638611987945
2021-05-25 22:18:35,314	INFO	__main__	Updating ckpt
2021-05-25 22:18:50,900	INFO	__main__	[Student: bert-large-uncased]
2021-05-25 22:18:54,369	INFO	/usr/local/lib/python3.7/dist-packages/datasets/metric.py	Removing /root/.cache/huggingface/metrics/glue/cola/default_experiment-1-0.arrow
2021-05-25 22:18:54,369	INFO	__main__	Test: matthews_correlation = 0.610638611987945
2021-05-25 22:18:54,369	INFO	__main__	Start prediction for private dataset(s)
2021-05-25 22:18:54,371	INFO	__main__	cola/test: 1063 samples