yoshitomo-matsubara
commited on
Commit
•
1d297c6
1
Parent(s):
a590a9d
initial commit
Browse files- README.md +17 -0
- config.json +26 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -0
- tokenizer.json +0 -0
- tokenizer_config.json +1 -0
- training.log +71 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- bert
|
5 |
+
- qnli
|
6 |
+
- glue
|
7 |
+
- torchdistill
|
8 |
+
license: apache-2.0
|
9 |
+
datasets:
|
10 |
+
- qnli
|
11 |
+
metrics:
|
12 |
+
- accuracy
|
13 |
+
---
|
14 |
+
|
15 |
+
`bert-base-uncased` fine-tuned on QNLI dataset, using [***torchdistill***](https://github.com/yoshitomo-matsubara/torchdistill) and [Google Colab](https://colab.research.google.com/github/yoshitomo-matsubara/torchdistill/blob/master/demo/glue_finetuning_and_submission.ipynb).
|
16 |
+
The hyperparameters are the same as those in Hugging Face's example and/or the paper of BERT, and the training configuration (including hyperparameters) is available [here](https://github.com/yoshitomo-matsubara/torchdistill/blob/main/configs/sample/glue/qnli/ce/bert_base_uncased.yaml).
|
17 |
+
I submitted prediction files to [the GLUE leaderboard](https://gluebenchmark.com/leaderboard), and the overall GLUE score was **77.9**.
|
config.json
ADDED
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "bert-base-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"finetuning_task": "qnli",
|
8 |
+
"gradient_checkpointing": false,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 3072,
|
14 |
+
"layer_norm_eps": 1e-12,
|
15 |
+
"max_position_embeddings": 512,
|
16 |
+
"model_type": "bert",
|
17 |
+
"num_attention_heads": 12,
|
18 |
+
"num_hidden_layers": 12,
|
19 |
+
"pad_token_id": 0,
|
20 |
+
"position_embedding_type": "absolute",
|
21 |
+
"problem_type": "single_label_classification",
|
22 |
+
"transformers_version": "4.6.1",
|
23 |
+
"type_vocab_size": 2,
|
24 |
+
"use_cache": true,
|
25 |
+
"vocab_size": 30522
|
26 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b8c95beb104444d0202d8892e7de060cc789ca94e45a852f0754275f6f32d39b
|
3 |
+
size 438024457
|
special_tokens_map.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": null, "do_lower": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "bert-base-uncased"}
|
training.log
ADDED
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
2021-05-29 02:48:20,975 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_base_uncased.yaml', log='log/glue/qnli/ce/bert_base_uncased.txt', private_output='leaderboard/glue/standard/bert_base_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
|
2 |
+
2021-05-29 02:48:21,006 INFO __main__ Distributed environment: NO
|
3 |
+
Num processes: 1
|
4 |
+
Process index: 0
|
5 |
+
Local process index: 0
|
6 |
+
Device: cuda
|
7 |
+
Use FP16 precision: True
|
8 |
+
|
9 |
+
2021-05-29 02:48:25,491 WARNING datasets.builder Reusing dataset glue (/root/.cache/huggingface/datasets/glue/qnli/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
|
10 |
+
2021-05-29 02:48:39,570 INFO __main__ Start training
|
11 |
+
2021-05-29 02:48:39,570 INFO torchdistill.models.util [student model]
|
12 |
+
2021-05-29 02:48:39,570 INFO torchdistill.models.util Using the original student model
|
13 |
+
2021-05-29 02:48:39,570 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
|
14 |
+
2021-05-29 02:48:42,319 INFO torchdistill.misc.log Epoch: [0] [ 0/6547] eta: 0:22:25 lr: 4.999745430477064e-05 sample/s: 20.197233795892227 loss: 0.7056 (0.7056) time: 0.2056 data: 0.0075 max mem: 1855
|
15 |
+
2021-05-29 02:50:10,157 INFO torchdistill.misc.log Epoch: [0] [ 500/6547] eta: 0:17:42 lr: 4.872460669008707e-05 sample/s: 21.980190963306118 loss: 0.3474 (0.4739) time: 0.1794 data: 0.0024 max mem: 3172
|
16 |
+
2021-05-29 02:51:39,153 INFO torchdistill.misc.log Epoch: [0] [1000/6547] eta: 0:16:21 lr: 4.7451759075403496e-05 sample/s: 23.378806479707368 loss: 0.3126 (0.4271) time: 0.1734 data: 0.0025 max mem: 3172
|
17 |
+
2021-05-29 02:53:07,257 INFO torchdistill.misc.log Epoch: [0] [1500/6547] eta: 0:14:51 lr: 4.6178911460719925e-05 sample/s: 23.41018861026731 loss: 0.2156 (0.3999) time: 0.1722 data: 0.0024 max mem: 3172
|
18 |
+
2021-05-29 02:54:35,981 INFO torchdistill.misc.log Epoch: [0] [2000/6547] eta: 0:13:24 lr: 4.4906063846036354e-05 sample/s: 18.511809581395966 loss: 0.2507 (0.3813) time: 0.1826 data: 0.0024 max mem: 3172
|
19 |
+
2021-05-29 02:56:05,545 INFO torchdistill.misc.log Epoch: [0] [2500/6547] eta: 0:11:57 lr: 4.3633216231352784e-05 sample/s: 21.954333167143645 loss: 0.2781 (0.3666) time: 0.1874 data: 0.0025 max mem: 3172
|
20 |
+
2021-05-29 02:57:35,013 INFO torchdistill.misc.log Epoch: [0] [3000/6547] eta: 0:10:29 lr: 4.236036861666921e-05 sample/s: 20.494309991840005 loss: 0.3193 (0.3557) time: 0.1706 data: 0.0024 max mem: 3172
|
21 |
+
2021-05-29 02:59:03,358 INFO torchdistill.misc.log Epoch: [0] [3500/6547] eta: 0:09:00 lr: 4.108752100198565e-05 sample/s: 25.07325354790115 loss: 0.3278 (0.3466) time: 0.1806 data: 0.0025 max mem: 3172
|
22 |
+
2021-05-29 03:00:31,656 INFO torchdistill.misc.log Epoch: [0] [4000/6547] eta: 0:07:31 lr: 3.981467338730208e-05 sample/s: 24.452091889827816 loss: 0.2232 (0.3390) time: 0.1825 data: 0.0026 max mem: 3172
|
23 |
+
2021-05-29 03:02:00,254 INFO torchdistill.misc.log Epoch: [0] [4500/6547] eta: 0:06:02 lr: 3.854182577261851e-05 sample/s: 29.69965444966861 loss: 0.2472 (0.3336) time: 0.1718 data: 0.0025 max mem: 3172
|
24 |
+
2021-05-29 03:03:29,173 INFO torchdistill.misc.log Epoch: [0] [5000/6547] eta: 0:04:34 lr: 3.7268978157934936e-05 sample/s: 23.38610623626117 loss: 0.2710 (0.3283) time: 0.1763 data: 0.0026 max mem: 3172
|
25 |
+
2021-05-29 03:04:57,593 INFO torchdistill.misc.log Epoch: [0] [5500/6547] eta: 0:03:05 lr: 3.5996130543251365e-05 sample/s: 21.977484404924404 loss: 0.2725 (0.3249) time: 0.1672 data: 0.0025 max mem: 3172
|
26 |
+
2021-05-29 03:06:26,120 INFO torchdistill.misc.log Epoch: [0] [6000/6547] eta: 0:01:36 lr: 3.4723282928567794e-05 sample/s: 32.334515402880136 loss: 0.2182 (0.3205) time: 0.1820 data: 0.0025 max mem: 3172
|
27 |
+
2021-05-29 03:07:54,877 INFO torchdistill.misc.log Epoch: [0] [6500/6547] eta: 0:00:08 lr: 3.345043531388422e-05 sample/s: 18.50968554791362 loss: 0.1868 (0.3170) time: 0.1836 data: 0.0024 max mem: 3172
|
28 |
+
2021-05-29 03:08:02,846 INFO torchdistill.misc.log Epoch: [0] Total time: 0:19:20
|
29 |
+
2021-05-29 03:08:22,659 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
|
30 |
+
2021-05-29 03:08:22,660 INFO __main__ Validation: accuracy = 0.9104887424492037
|
31 |
+
2021-05-29 03:08:22,660 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased
|
32 |
+
2021-05-29 03:08:24,034 INFO torchdistill.misc.log Epoch: [1] [ 0/6547] eta: 0:18:11 lr: 3.3330787638103964e-05 sample/s: 24.98985791548809 loss: 0.1000 (0.1000) time: 0.1667 data: 0.0066 max mem: 3172
|
33 |
+
2021-05-29 03:09:52,894 INFO torchdistill.misc.log Epoch: [1] [ 500/6547] eta: 0:17:54 lr: 3.20579400234204e-05 sample/s: 23.463095432887627 loss: 0.1836 (0.2297) time: 0.1741 data: 0.0023 max mem: 3172
|
34 |
+
2021-05-29 03:11:19,875 INFO torchdistill.misc.log Epoch: [1] [1000/6547] eta: 0:16:15 lr: 3.078509240873683e-05 sample/s: 29.82499528907314 loss: 0.1741 (0.2197) time: 0.1664 data: 0.0022 max mem: 3172
|
35 |
+
2021-05-29 03:12:48,699 INFO torchdistill.misc.log Epoch: [1] [1500/6547] eta: 0:14:50 lr: 2.9512244794053258e-05 sample/s: 22.036980458965402 loss: 0.1657 (0.2246) time: 0.1777 data: 0.0023 max mem: 3172
|
36 |
+
2021-05-29 03:14:17,914 INFO torchdistill.misc.log Epoch: [1] [2000/6547] eta: 0:13:24 lr: 2.8239397179369687e-05 sample/s: 29.625550715245563 loss: 0.1794 (0.2179) time: 0.1702 data: 0.0025 max mem: 3172
|
37 |
+
2021-05-29 03:15:47,019 INFO torchdistill.misc.log Epoch: [1] [2500/6547] eta: 0:11:57 lr: 2.6966549564686116e-05 sample/s: 25.09331686093034 loss: 0.2343 (0.2235) time: 0.1730 data: 0.0024 max mem: 3172
|
38 |
+
2021-05-29 03:17:16,122 INFO torchdistill.misc.log Epoch: [1] [3000/6547] eta: 0:10:29 lr: 2.5693701950002545e-05 sample/s: 23.375223445979366 loss: 0.1512 (0.2240) time: 0.1806 data: 0.0026 max mem: 3172
|
39 |
+
2021-05-29 03:18:44,097 INFO torchdistill.misc.log Epoch: [1] [3500/6547] eta: 0:08:59 lr: 2.4420854335318978e-05 sample/s: 25.041891678085157 loss: 0.2174 (0.2236) time: 0.1651 data: 0.0024 max mem: 3172
|
40 |
+
2021-05-29 03:20:13,271 INFO torchdistill.misc.log Epoch: [1] [4000/6547] eta: 0:07:31 lr: 2.3148006720635407e-05 sample/s: 24.913154784178857 loss: 0.2000 (0.2243) time: 0.1819 data: 0.0024 max mem: 3172
|
41 |
+
2021-05-29 03:21:41,587 INFO torchdistill.misc.log Epoch: [1] [4500/6547] eta: 0:06:02 lr: 2.1875159105951836e-05 sample/s: 27.05793099889041 loss: 0.1466 (0.2222) time: 0.1830 data: 0.0025 max mem: 3172
|
42 |
+
2021-05-29 03:23:10,809 INFO torchdistill.misc.log Epoch: [1] [5000/6547] eta: 0:04:34 lr: 2.060231149126827e-05 sample/s: 25.019634276919955 loss: 0.1216 (0.2227) time: 0.1866 data: 0.0025 max mem: 3172
|
43 |
+
2021-05-29 03:24:40,562 INFO torchdistill.misc.log Epoch: [1] [5500/6547] eta: 0:03:05 lr: 1.9329463876584698e-05 sample/s: 24.97914237942735 loss: 0.1274 (0.2230) time: 0.1777 data: 0.0026 max mem: 3172
|
44 |
+
2021-05-29 03:26:09,896 INFO torchdistill.misc.log Epoch: [1] [6000/6547] eta: 0:01:37 lr: 1.8056616261901127e-05 sample/s: 19.578257759376193 loss: 0.2182 (0.2260) time: 0.1716 data: 0.0024 max mem: 3172
|
45 |
+
2021-05-29 03:27:37,859 INFO torchdistill.misc.log Epoch: [1] [6500/6547] eta: 0:00:08 lr: 1.6783768647217556e-05 sample/s: 17.566110000104704 loss: 0.1400 (0.2247) time: 0.1825 data: 0.0025 max mem: 3172
|
46 |
+
2021-05-29 03:27:45,754 INFO torchdistill.misc.log Epoch: [1] Total time: 0:19:21
|
47 |
+
2021-05-29 03:28:05,670 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
|
48 |
+
2021-05-29 03:28:05,670 INFO __main__ Validation: accuracy = 0.9157971810360608
|
49 |
+
2021-05-29 03:28:05,670 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-base-uncased
|
50 |
+
2021-05-29 03:28:07,062 INFO torchdistill.misc.log Epoch: [2] [ 0/6547] eta: 0:19:23 lr: 1.66641209714373e-05 sample/s: 23.338836537043683 loss: 0.0048 (0.0048) time: 0.1778 data: 0.0064 max mem: 3172
|
51 |
+
2021-05-29 03:29:36,066 INFO torchdistill.misc.log Epoch: [2] [ 500/6547] eta: 0:17:56 lr: 1.539127335675373e-05 sample/s: 24.923480536998497 loss: 0.0000 (0.1601) time: 0.1721 data: 0.0025 max mem: 3172
|
52 |
+
2021-05-29 03:31:04,840 INFO torchdistill.misc.log Epoch: [2] [1000/6547] eta: 0:16:26 lr: 1.411842574207016e-05 sample/s: 20.586551487189556 loss: 0.0000 (0.1932) time: 0.1806 data: 0.0024 max mem: 3172
|
53 |
+
2021-05-29 03:32:33,340 INFO torchdistill.misc.log Epoch: [2] [1500/6547] eta: 0:14:55 lr: 1.284557812738659e-05 sample/s: 27.066574171170444 loss: 0.0001 (0.2193) time: 0.1707 data: 0.0025 max mem: 3172
|
54 |
+
2021-05-29 03:34:01,072 INFO torchdistill.misc.log Epoch: [2] [2000/6547] eta: 0:13:24 lr: 1.157273051270302e-05 sample/s: 23.399446576809783 loss: 0.0000 (0.2212) time: 0.1782 data: 0.0025 max mem: 3172
|
55 |
+
2021-05-29 03:35:28,994 INFO torchdistill.misc.log Epoch: [2] [2500/6547] eta: 0:11:55 lr: 1.029988289801945e-05 sample/s: 26.919134529759564 loss: 0.0000 (0.2266) time: 0.1797 data: 0.0025 max mem: 3172
|
56 |
+
2021-05-29 03:36:57,793 INFO torchdistill.misc.log Epoch: [2] [3000/6547] eta: 0:10:27 lr: 9.02703528333588e-06 sample/s: 30.056999534200436 loss: 0.0000 (0.2243) time: 0.1740 data: 0.0024 max mem: 3172
|
57 |
+
2021-05-29 03:38:26,426 INFO torchdistill.misc.log Epoch: [2] [3500/6547] eta: 0:08:59 lr: 7.754187668652309e-06 sample/s: 18.62709103808307 loss: 0.0000 (0.2244) time: 0.1774 data: 0.0024 max mem: 3172
|
58 |
+
2021-05-29 03:39:54,542 INFO torchdistill.misc.log Epoch: [2] [4000/6547] eta: 0:07:30 lr: 6.4813400539687385e-06 sample/s: 19.757705370559666 loss: 0.0000 (0.2285) time: 0.1680 data: 0.0024 max mem: 3172
|
59 |
+
2021-05-29 03:41:23,407 INFO torchdistill.misc.log Epoch: [2] [4500/6547] eta: 0:06:02 lr: 5.208492439285169e-06 sample/s: 25.014672839368117 loss: 0.0000 (0.2261) time: 0.1812 data: 0.0025 max mem: 3172
|
60 |
+
2021-05-29 03:42:51,296 INFO torchdistill.misc.log Epoch: [2] [5000/6547] eta: 0:04:33 lr: 3.935644824601599e-06 sample/s: 21.941957945013066 loss: 0.0000 (0.2248) time: 0.1673 data: 0.0024 max mem: 3172
|
61 |
+
2021-05-29 03:44:19,640 INFO torchdistill.misc.log Epoch: [2] [5500/6547] eta: 0:03:05 lr: 2.662797209918029e-06 sample/s: 27.461809801138923 loss: 0.0000 (0.2209) time: 0.1728 data: 0.0024 max mem: 3172
|
62 |
+
2021-05-29 03:45:47,660 INFO torchdistill.misc.log Epoch: [2] [6000/6547] eta: 0:01:36 lr: 1.3899495952344585e-06 sample/s: 17.73160173160173 loss: 0.0000 (0.2196) time: 0.1829 data: 0.0026 max mem: 3172
|
63 |
+
2021-05-29 03:47:15,622 INFO torchdistill.misc.log Epoch: [2] [6500/6547] eta: 0:00:08 lr: 1.1710198055088846e-07 sample/s: 17.73863212240207 loss: 0.0000 (0.2180) time: 0.1774 data: 0.0025 max mem: 3172
|
64 |
+
2021-05-29 03:47:23,808 INFO torchdistill.misc.log Epoch: [2] Total time: 0:19:16
|
65 |
+
2021-05-29 03:47:43,628 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
|
66 |
+
2021-05-29 03:47:43,629 INFO __main__ Validation: accuracy = 0.9115870400878638
|
67 |
+
2021-05-29 03:47:47,015 INFO __main__ [Student: bert-base-uncased]
|
68 |
+
2021-05-29 03:48:06,856 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
|
69 |
+
2021-05-29 03:48:06,857 INFO __main__ Test: accuracy = 0.9157971810360608
|
70 |
+
2021-05-29 03:48:06,857 INFO __main__ Start prediction for private dataset(s)
|
71 |
+
2021-05-29 03:48:06,858 INFO __main__ qnli/test: 5463 samples
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|