pranaydeeps commited on
Commit
6e6d07a
·
verified ·
1 Parent(s): 08219fe

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - generated_from_trainer
5
+ metrics:
6
+ - precision
7
+ - recall
8
+ - f1
9
+ - accuracy
10
+ model-index:
11
+ - name: pos_final_mono_en
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # pos_final_mono_en
19
+
20
+ This model is a fine-tuned version of [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) on the None dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 0.0681
23
+ - Precision: 0.9696
24
+ - Recall: 0.9714
25
+ - F1: 0.9705
26
+ - Accuracy: 0.9796
27
+
28
+ ## Model description
29
+
30
+ More information needed
31
+
32
+ ## Intended uses & limitations
33
+
34
+ More information needed
35
+
36
+ ## Training and evaluation data
37
+
38
+ More information needed
39
+
40
+ ## Training procedure
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 5e-05
46
+ - train_batch_size: 256
47
+ - eval_batch_size: 256
48
+ - seed: 42
49
+ - gradient_accumulation_steps: 4
50
+ - total_train_batch_size: 1024
51
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
+ - lr_scheduler_type: linear
53
+ - lr_scheduler_warmup_steps: 500
54
+ - num_epochs: 40.0
55
+ - mixed_precision_training: Native AMP
56
+
57
+ ### Training results
58
+
59
+ | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
60
+ |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
61
+ | No log | 0.99 | 60 | 2.7933 | 0.3216 | 0.0997 | 0.1522 | 0.2833 |
62
+ | No log | 1.99 | 120 | 0.3818 | 0.9075 | 0.8989 | 0.9032 | 0.9224 |
63
+ | No log | 2.99 | 180 | 0.1156 | 0.9602 | 0.9607 | 0.9605 | 0.9721 |
64
+ | No log | 3.99 | 240 | 0.0911 | 0.9634 | 0.9650 | 0.9642 | 0.9748 |
65
+ | No log | 4.99 | 300 | 0.0794 | 0.9664 | 0.9679 | 0.9671 | 0.9772 |
66
+ | No log | 5.99 | 360 | 0.0741 | 0.9670 | 0.9697 | 0.9683 | 0.9781 |
67
+ | No log | 6.99 | 420 | 0.0695 | 0.9683 | 0.9702 | 0.9693 | 0.9787 |
68
+ | No log | 7.99 | 480 | 0.0688 | 0.9686 | 0.9700 | 0.9693 | 0.9789 |
69
+ | 0.7281 | 8.99 | 540 | 0.0675 | 0.9688 | 0.9703 | 0.9695 | 0.9789 |
70
+ | 0.7281 | 9.99 | 600 | 0.0670 | 0.9687 | 0.9705 | 0.9696 | 0.9791 |
71
+ | 0.7281 | 10.99 | 660 | 0.0658 | 0.9696 | 0.9702 | 0.9699 | 0.9792 |
72
+ | 0.7281 | 11.99 | 720 | 0.0670 | 0.9684 | 0.9715 | 0.9700 | 0.9793 |
73
+ | 0.7281 | 12.99 | 780 | 0.0672 | 0.9689 | 0.9711 | 0.9700 | 0.9792 |
74
+ | 0.7281 | 13.99 | 840 | 0.0678 | 0.9698 | 0.9708 | 0.9703 | 0.9796 |
75
+ | 0.7281 | 14.99 | 900 | 0.0681 | 0.9696 | 0.9714 | 0.9705 | 0.9796 |
76
+ | 0.7281 | 15.99 | 960 | 0.0706 | 0.9696 | 0.9711 | 0.9703 | 0.9795 |
77
+ | 0.0484 | 16.99 | 1020 | 0.0725 | 0.9694 | 0.9705 | 0.9699 | 0.9793 |
78
+ | 0.0484 | 17.99 | 1080 | 0.0735 | 0.9689 | 0.9705 | 0.9697 | 0.9791 |
79
+ | 0.0484 | 18.99 | 1140 | 0.0745 | 0.9690 | 0.9705 | 0.9698 | 0.9792 |
80
+ | 0.0484 | 19.99 | 1200 | 0.0769 | 0.9690 | 0.9706 | 0.9698 | 0.9791 |
81
+ | 0.0484 | 20.99 | 1260 | 0.0797 | 0.9691 | 0.9703 | 0.9697 | 0.9791 |
82
+ | 0.0484 | 21.99 | 1320 | 0.0808 | 0.9689 | 0.9705 | 0.9697 | 0.9791 |
83
+ | 0.0484 | 22.99 | 1380 | 0.0838 | 0.9691 | 0.9702 | 0.9697 | 0.9791 |
84
+ | 0.0484 | 23.99 | 1440 | 0.0861 | 0.9685 | 0.9704 | 0.9695 | 0.9789 |
85
+ | 0.0289 | 24.99 | 1500 | 0.0879 | 0.9684 | 0.9698 | 0.9691 | 0.9787 |
86
+ | 0.0289 | 25.99 | 1560 | 0.0887 | 0.9684 | 0.9703 | 0.9694 | 0.9789 |
87
+ | 0.0289 | 26.99 | 1620 | 0.0910 | 0.9684 | 0.9698 | 0.9691 | 0.9787 |
88
+ | 0.0289 | 27.99 | 1680 | 0.0924 | 0.9684 | 0.9697 | 0.9691 | 0.9787 |
89
+ | 0.0289 | 28.99 | 1740 | 0.0950 | 0.9693 | 0.9692 | 0.9693 | 0.9788 |
90
+ | 0.0289 | 29.99 | 1800 | 0.0962 | 0.9692 | 0.9697 | 0.9694 | 0.9789 |
91
+ | 0.0289 | 30.99 | 1860 | 0.0977 | 0.9687 | 0.9699 | 0.9693 | 0.9787 |
92
+ | 0.0289 | 31.99 | 1920 | 0.0979 | 0.9688 | 0.9699 | 0.9694 | 0.9788 |
93
+ | 0.0289 | 32.99 | 1980 | 0.1000 | 0.9687 | 0.9698 | 0.9692 | 0.9788 |
94
+ | 0.018 | 33.99 | 2040 | 0.1021 | 0.9688 | 0.9698 | 0.9693 | 0.9788 |
95
+ | 0.018 | 34.99 | 2100 | 0.1037 | 0.9687 | 0.9701 | 0.9694 | 0.9788 |
96
+ | 0.018 | 35.99 | 2160 | 0.1035 | 0.9688 | 0.9703 | 0.9696 | 0.9790 |
97
+ | 0.018 | 36.99 | 2220 | 0.1042 | 0.9688 | 0.9700 | 0.9694 | 0.9789 |
98
+ | 0.018 | 37.99 | 2280 | 0.1053 | 0.9685 | 0.9699 | 0.9692 | 0.9787 |
99
+ | 0.018 | 38.99 | 2340 | 0.1052 | 0.9689 | 0.9700 | 0.9695 | 0.9789 |
100
+ | 0.018 | 39.99 | 2400 | 0.1054 | 0.9688 | 0.9700 | 0.9694 | 0.9788 |
101
+
102
+
103
+ ### Framework versions
104
+
105
+ - Transformers 4.25.1
106
+ - Pytorch 1.12.0
107
+ - Datasets 2.18.0
108
+ - Tokenizers 0.13.2
all_results.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.99,
3
+ "eval_accuracy": 0.9796479120738681,
4
+ "eval_f1": 0.9704877076819325,
5
+ "eval_loss": 0.06808918714523315,
6
+ "eval_precision": 0.969581195926805,
7
+ "eval_recall": 0.971395916113097,
8
+ "eval_runtime": 8.3139,
9
+ "eval_samples": 2072,
10
+ "eval_samples_per_second": 831.143,
11
+ "eval_steps_per_second": 3.248,
12
+ "train_loss": 0.17379826227823894,
13
+ "train_runtime": 1699.1945,
14
+ "train_samples": 62189,
15
+ "train_samples_per_second": 1463.964,
16
+ "train_steps_per_second": 1.412
17
+ }
config.json ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "FacebookAI/roberta-base",
3
+ "architectures": [
4
+ "RobertaForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "finetuning_task": "pos",
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "id2label": {
15
+ "0": "WP",
16
+ "1": "VB",
17
+ "2": "RB",
18
+ "3": "terrorist\t#",
19
+ "4": "''",
20
+ "5": "`",
21
+ "6": "VBD",
22
+ "7": "RBS",
23
+ "8": "PRP$",
24
+ "9": "NN",
25
+ "10": "PRP",
26
+ "11": "NNP",
27
+ "12": "RP",
28
+ "13": "\tSYM",
29
+ "14": "WDT",
30
+ "15": "U",
31
+ "16": "JJ",
32
+ "17": "JJR",
33
+ "18": "FW",
34
+ "19": "POS",
35
+ "20": "CD",
36
+ "21": "VBN",
37
+ "22": "RBR",
38
+ "23": "hero\t#",
39
+ "24": ",",
40
+ "25": "it",
41
+ "26": ":",
42
+ "27": "Ready\t#",
43
+ "28": "WRB",
44
+ "29": "VBP",
45
+ "30": "NNPS",
46
+ "31": "$",
47
+ "32": "TO",
48
+ "33": "VBG",
49
+ "34": ")",
50
+ "35": "JJS",
51
+ "36": "#",
52
+ "37": "sleepy\t#",
53
+ "38": "IN",
54
+ "39": "\tPRP",
55
+ "40": "``",
56
+ "41": "PDT",
57
+ "42": "@",
58
+ "43": "DT",
59
+ "44": "VBZ",
60
+ "45": "NNS",
61
+ "46": "LS",
62
+ "47": ".",
63
+ "48": "\tDT",
64
+ "49": "EX",
65
+ "50": "SYM",
66
+ "51": "CC",
67
+ "52": "UH",
68
+ "53": "MD",
69
+ "54": "(",
70
+ "55": "WP$"
71
+ },
72
+ "initializer_range": 0.02,
73
+ "intermediate_size": 3072,
74
+ "label2id": {
75
+ "\tDT": 48,
76
+ "\tPRP": 39,
77
+ "\tSYM": 13,
78
+ "#": 36,
79
+ "$": 31,
80
+ "''": 4,
81
+ "(": 54,
82
+ ")": 34,
83
+ ",": 24,
84
+ ".": 47,
85
+ ":": 26,
86
+ "@": 42,
87
+ "CC": 51,
88
+ "CD": 20,
89
+ "DT": 43,
90
+ "EX": 49,
91
+ "FW": 18,
92
+ "IN": 38,
93
+ "JJ": 16,
94
+ "JJR": 17,
95
+ "JJS": 35,
96
+ "LS": 46,
97
+ "MD": 53,
98
+ "NN": 9,
99
+ "NNP": 11,
100
+ "NNPS": 30,
101
+ "NNS": 45,
102
+ "PDT": 41,
103
+ "POS": 19,
104
+ "PRP": 10,
105
+ "PRP$": 8,
106
+ "RB": 2,
107
+ "RBR": 22,
108
+ "RBS": 7,
109
+ "RP": 12,
110
+ "Ready\t#": 27,
111
+ "SYM": 50,
112
+ "TO": 32,
113
+ "U": 15,
114
+ "UH": 52,
115
+ "VB": 1,
116
+ "VBD": 6,
117
+ "VBG": 33,
118
+ "VBN": 21,
119
+ "VBP": 29,
120
+ "VBZ": 44,
121
+ "WDT": 14,
122
+ "WP": 0,
123
+ "WP$": 55,
124
+ "WRB": 28,
125
+ "`": 5,
126
+ "``": 40,
127
+ "hero\t#": 23,
128
+ "it": 25,
129
+ "sleepy\t#": 37,
130
+ "terrorist\t#": 3
131
+ },
132
+ "layer_norm_eps": 1e-05,
133
+ "max_position_embeddings": 514,
134
+ "model_type": "roberta",
135
+ "num_attention_heads": 12,
136
+ "num_hidden_layers": 12,
137
+ "pad_token_id": 1,
138
+ "position_embedding_type": "absolute",
139
+ "torch_dtype": "float32",
140
+ "transformers_version": "4.25.1",
141
+ "type_vocab_size": 1,
142
+ "use_cache": true,
143
+ "vocab_size": 50265
144
+ }
eval_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.99,
3
+ "eval_accuracy": 0.9796479120738681,
4
+ "eval_f1": 0.9704877076819325,
5
+ "eval_loss": 0.06808918714523315,
6
+ "eval_precision": 0.969581195926805,
7
+ "eval_recall": 0.971395916113097,
8
+ "eval_runtime": 8.3139,
9
+ "eval_samples": 2072,
10
+ "eval_samples_per_second": 831.143,
11
+ "eval_steps_per_second": 3.248
12
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71a49303af36dfd77a50f18f363c750a6b3d15df1e1140d336cbd255fcba745f
3
+ size 496463473
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "bos_token": "<s>",
4
+ "cls_token": "<s>",
5
+ "eos_token": "</s>",
6
+ "errors": "replace",
7
+ "mask_token": "<mask>",
8
+ "max_length": 128,
9
+ "model_max_length": 512,
10
+ "name_or_path": "FacebookAI/roberta-base",
11
+ "pad_token": "<pad>",
12
+ "sep_token": "</s>",
13
+ "special_tokens_map_file": null,
14
+ "token": null,
15
+ "tokenizer_class": "RobertaTokenizer",
16
+ "trim_offsets": true,
17
+ "unk_token": "<unk>"
18
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.99,
3
+ "train_loss": 0.17379826227823894,
4
+ "train_runtime": 1699.1945,
5
+ "train_samples": 62189,
6
+ "train_samples_per_second": 1463.964,
7
+ "train_steps_per_second": 1.412
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,529 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9704877076819325,
3
+ "best_model_checkpoint": "models/pos_final_mono_en/checkpoint-900",
4
+ "epoch": 39.98765432098765,
5
+ "global_step": 2400,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.99,
12
+ "eval_accuracy": 0.2832997859356529,
13
+ "eval_f1": 0.15215239835327324,
14
+ "eval_loss": 2.7932815551757812,
15
+ "eval_precision": 0.3215832858875659,
16
+ "eval_recall": 0.09965021554699845,
17
+ "eval_runtime": 8.3095,
18
+ "eval_samples_per_second": 831.582,
19
+ "eval_steps_per_second": 3.249,
20
+ "step": 60
21
+ },
22
+ {
23
+ "epoch": 1.99,
24
+ "eval_accuracy": 0.92244480654334,
25
+ "eval_f1": 0.9032193422583952,
26
+ "eval_loss": 0.3818030059337616,
27
+ "eval_precision": 0.9075490985688619,
28
+ "eval_recall": 0.8989307027906049,
29
+ "eval_runtime": 8.786,
30
+ "eval_samples_per_second": 786.48,
31
+ "eval_steps_per_second": 3.073,
32
+ "step": 120
33
+ },
34
+ {
35
+ "epoch": 2.99,
36
+ "eval_accuracy": 0.9720821751493658,
37
+ "eval_f1": 0.9604647150169666,
38
+ "eval_loss": 0.11557099223136902,
39
+ "eval_precision": 0.9601959536641649,
40
+ "eval_recall": 0.9607336268659006,
41
+ "eval_runtime": 8.8137,
42
+ "eval_samples_per_second": 784.005,
43
+ "eval_steps_per_second": 3.063,
44
+ "step": 180
45
+ },
46
+ {
47
+ "epoch": 3.99,
48
+ "eval_accuracy": 0.974849036710438,
49
+ "eval_f1": 0.9641873067091794,
50
+ "eval_loss": 0.09111332893371582,
51
+ "eval_precision": 0.9634156614972238,
52
+ "eval_recall": 0.964960189006336,
53
+ "eval_runtime": 8.5212,
54
+ "eval_samples_per_second": 810.917,
55
+ "eval_steps_per_second": 3.169,
56
+ "step": 240
57
+ },
58
+ {
59
+ "epoch": 4.99,
60
+ "eval_accuracy": 0.9772261094603661,
61
+ "eval_f1": 0.9671334513708446,
62
+ "eval_loss": 0.07944779098033905,
63
+ "eval_precision": 0.9663853317811408,
64
+ "eval_recall": 0.9678827301597042,
65
+ "eval_runtime": 8.3156,
66
+ "eval_samples_per_second": 830.964,
67
+ "eval_steps_per_second": 3.247,
68
+ "step": 300
69
+ },
70
+ {
71
+ "epoch": 5.99,
72
+ "eval_accuracy": 0.9781015367903128,
73
+ "eval_f1": 0.9683213898602403,
74
+ "eval_loss": 0.07408788055181503,
75
+ "eval_precision": 0.9669917766303309,
76
+ "eval_recall": 0.9696546645597779,
77
+ "eval_runtime": 8.4486,
78
+ "eval_samples_per_second": 817.888,
79
+ "eval_steps_per_second": 3.196,
80
+ "step": 360
81
+ },
82
+ {
83
+ "epoch": 6.99,
84
+ "eval_accuracy": 0.9787405348413687,
85
+ "eval_f1": 0.9692896581055761,
86
+ "eval_loss": 0.06950810551643372,
87
+ "eval_precision": 0.9683435282228738,
88
+ "eval_recall": 0.9702376386481137,
89
+ "eval_runtime": 8.5813,
90
+ "eval_samples_per_second": 805.242,
91
+ "eval_steps_per_second": 3.146,
92
+ "step": 420
93
+ },
94
+ {
95
+ "epoch": 7.99,
96
+ "eval_accuracy": 0.9788555544905588,
97
+ "eval_f1": 0.9692969848880303,
98
+ "eval_loss": 0.06879482418298721,
99
+ "eval_precision": 0.9686027896716276,
100
+ "eval_recall": 0.9699921758740776,
101
+ "eval_runtime": 8.4632,
102
+ "eval_samples_per_second": 816.474,
103
+ "eval_steps_per_second": 3.19,
104
+ "step": 480
105
+ },
106
+ {
107
+ "epoch": 8.33,
108
+ "learning_rate": 5e-05,
109
+ "loss": 0.7281,
110
+ "step": 500
111
+ },
112
+ {
113
+ "epoch": 8.99,
114
+ "eval_accuracy": 0.9788938943736222,
115
+ "eval_f1": 0.9695171959192747,
116
+ "eval_loss": 0.06750848144292831,
117
+ "eval_precision": 0.9687672323999755,
118
+ "eval_recall": 0.9702683214948683,
119
+ "eval_runtime": 8.3585,
120
+ "eval_samples_per_second": 826.702,
121
+ "eval_steps_per_second": 3.23,
122
+ "step": 540
123
+ },
124
+ {
125
+ "epoch": 9.99,
126
+ "eval_accuracy": 0.9791367136330235,
127
+ "eval_f1": 0.9696259085980099,
128
+ "eval_loss": 0.06700527667999268,
129
+ "eval_precision": 0.9687090871505899,
130
+ "eval_recall": 0.970544467115659,
131
+ "eval_runtime": 8.4143,
132
+ "eval_samples_per_second": 821.219,
133
+ "eval_steps_per_second": 3.209,
134
+ "step": 600
135
+ },
136
+ {
137
+ "epoch": 10.99,
138
+ "eval_accuracy": 0.9792261733601713,
139
+ "eval_f1": 0.9698826362182866,
140
+ "eval_loss": 0.06581725925207138,
141
+ "eval_precision": 0.9695815158646807,
142
+ "eval_recall": 0.9701839436662933,
143
+ "eval_runtime": 8.2031,
144
+ "eval_samples_per_second": 842.369,
145
+ "eval_steps_per_second": 3.291,
146
+ "step": 660
147
+ },
148
+ {
149
+ "epoch": 11.99,
150
+ "eval_accuracy": 0.9792900731652768,
151
+ "eval_f1": 0.9699631623689432,
152
+ "eval_loss": 0.06702851504087448,
153
+ "eval_precision": 0.968427894173421,
154
+ "eval_recall": 0.9715033060767378,
155
+ "eval_runtime": 8.414,
156
+ "eval_samples_per_second": 821.254,
157
+ "eval_steps_per_second": 3.209,
158
+ "step": 720
159
+ },
160
+ {
161
+ "epoch": 12.99,
162
+ "eval_accuracy": 0.9792261733601713,
163
+ "eval_f1": 0.9699927596470999,
164
+ "eval_loss": 0.0671982690691948,
165
+ "eval_precision": 0.9688607265575376,
166
+ "eval_recall": 0.9711274412039949,
167
+ "eval_runtime": 9.0147,
168
+ "eval_samples_per_second": 766.523,
169
+ "eval_steps_per_second": 2.995,
170
+ "step": 780
171
+ },
172
+ {
173
+ "epoch": 13.99,
174
+ "eval_accuracy": 0.9795584523467203,
175
+ "eval_f1": 0.9702842773467448,
176
+ "eval_loss": 0.06784472614526749,
177
+ "eval_precision": 0.969763842275451,
178
+ "eval_recall": 0.9708052713130725,
179
+ "eval_runtime": 8.2137,
180
+ "eval_samples_per_second": 841.282,
181
+ "eval_steps_per_second": 3.287,
182
+ "step": 840
183
+ },
184
+ {
185
+ "epoch": 14.99,
186
+ "eval_accuracy": 0.9796479120738681,
187
+ "eval_f1": 0.9704877076819325,
188
+ "eval_loss": 0.06808918714523315,
189
+ "eval_precision": 0.969581195926805,
190
+ "eval_recall": 0.971395916113097,
191
+ "eval_runtime": 8.1766,
192
+ "eval_samples_per_second": 845.097,
193
+ "eval_steps_per_second": 3.302,
194
+ "step": 900
195
+ },
196
+ {
197
+ "epoch": 15.99,
198
+ "eval_accuracy": 0.9794817725805937,
199
+ "eval_f1": 0.9703219971333746,
200
+ "eval_loss": 0.07063417881727219,
201
+ "eval_precision": 0.9695714110654985,
202
+ "eval_recall": 0.9710737462221745,
203
+ "eval_runtime": 8.3057,
204
+ "eval_samples_per_second": 831.963,
205
+ "eval_steps_per_second": 3.251,
206
+ "step": 960
207
+ },
208
+ {
209
+ "epoch": 16.66,
210
+ "learning_rate": 3.6842105263157895e-05,
211
+ "loss": 0.0484,
212
+ "step": 1000
213
+ },
214
+ {
215
+ "epoch": 16.99,
216
+ "eval_accuracy": 0.9793284130483402,
217
+ "eval_f1": 0.9699285875827489,
218
+ "eval_loss": 0.07248909771442413,
219
+ "eval_precision": 0.9693823603778934,
220
+ "eval_recall": 0.9704754307104613,
221
+ "eval_runtime": 8.1639,
222
+ "eval_samples_per_second": 846.409,
223
+ "eval_steps_per_second": 3.307,
224
+ "step": 1020
225
+ },
226
+ {
227
+ "epoch": 17.99,
228
+ "eval_accuracy": 0.9790855937889389,
229
+ "eval_f1": 0.9696844283497156,
230
+ "eval_loss": 0.0734858438372612,
231
+ "eval_precision": 0.9688564886782195,
232
+ "eval_recall": 0.9705137842689044,
233
+ "eval_runtime": 8.302,
234
+ "eval_samples_per_second": 832.333,
235
+ "eval_steps_per_second": 3.252,
236
+ "step": 1080
237
+ },
238
+ {
239
+ "epoch": 18.99,
240
+ "eval_accuracy": 0.9791878334771079,
241
+ "eval_f1": 0.9697733866300795,
242
+ "eval_loss": 0.0745043233036995,
243
+ "eval_precision": 0.969041765278065,
244
+ "eval_recall": 0.9705061135572158,
245
+ "eval_runtime": 8.2203,
246
+ "eval_samples_per_second": 840.607,
247
+ "eval_steps_per_second": 3.285,
248
+ "step": 1140
249
+ },
250
+ {
251
+ "epoch": 19.99,
252
+ "eval_accuracy": 0.9791111537109812,
253
+ "eval_f1": 0.9697643226671777,
254
+ "eval_loss": 0.07685930281877518,
255
+ "eval_precision": 0.9689548489860933,
256
+ "eval_recall": 0.9705751499624136,
257
+ "eval_runtime": 8.1933,
258
+ "eval_samples_per_second": 843.373,
259
+ "eval_steps_per_second": 3.295,
260
+ "step": 1200
261
+ },
262
+ {
263
+ "epoch": 20.99,
264
+ "eval_accuracy": 0.9790600338668967,
265
+ "eval_f1": 0.9696818704484477,
266
+ "eval_loss": 0.0796540305018425,
267
+ "eval_precision": 0.969057869980235,
268
+ "eval_recall": 0.9703066750533115,
269
+ "eval_runtime": 8.1909,
270
+ "eval_samples_per_second": 843.621,
271
+ "eval_steps_per_second": 3.296,
272
+ "step": 1260
273
+ },
274
+ {
275
+ "epoch": 21.99,
276
+ "eval_accuracy": 0.9790983737499601,
277
+ "eval_f1": 0.9697178726633098,
278
+ "eval_loss": 0.08079346269369125,
279
+ "eval_precision": 0.9689232654311534,
280
+ "eval_recall": 0.9705137842689044,
281
+ "eval_runtime": 8.2659,
282
+ "eval_samples_per_second": 835.967,
283
+ "eval_steps_per_second": 3.266,
284
+ "step": 1320
285
+ },
286
+ {
287
+ "epoch": 22.99,
288
+ "eval_accuracy": 0.9791047637304706,
289
+ "eval_f1": 0.9696795461514873,
290
+ "eval_loss": 0.08375120162963867,
291
+ "eval_precision": 0.9691297485327245,
292
+ "eval_recall": 0.9702299679364251,
293
+ "eval_runtime": 8.1982,
294
+ "eval_samples_per_second": 842.873,
295
+ "eval_steps_per_second": 3.293,
296
+ "step": 1380
297
+ },
298
+ {
299
+ "epoch": 23.99,
300
+ "eval_accuracy": 0.9789322342566855,
301
+ "eval_f1": 0.9694696267232725,
302
+ "eval_loss": 0.08609236031770706,
303
+ "eval_precision": 0.9685270249578931,
304
+ "eval_recall": 0.9704140650169523,
305
+ "eval_runtime": 8.2431,
306
+ "eval_samples_per_second": 838.274,
307
+ "eval_steps_per_second": 3.275,
308
+ "step": 1440
309
+ },
310
+ {
311
+ "epoch": 24.99,
312
+ "learning_rate": 2.368421052631579e-05,
313
+ "loss": 0.0289,
314
+ "step": 1500
315
+ },
316
+ {
317
+ "epoch": 24.99,
318
+ "eval_accuracy": 0.978663855075242,
319
+ "eval_f1": 0.9691399662731872,
320
+ "eval_loss": 0.08786529302597046,
321
+ "eval_precision": 0.9684421771833878,
322
+ "eval_recall": 0.969838761640305,
323
+ "eval_runtime": 8.3812,
324
+ "eval_samples_per_second": 824.467,
325
+ "eval_steps_per_second": 3.222,
326
+ "step": 1500
327
+ },
328
+ {
329
+ "epoch": 25.99,
330
+ "eval_accuracy": 0.9788747244320904,
331
+ "eval_f1": 0.9693623412750109,
332
+ "eval_loss": 0.08869566768407822,
333
+ "eval_precision": 0.968419843821773,
334
+ "eval_recall": 0.9703066750533115,
335
+ "eval_runtime": 8.2152,
336
+ "eval_samples_per_second": 841.126,
337
+ "eval_steps_per_second": 3.287,
338
+ "step": 1560
339
+ },
340
+ {
341
+ "epoch": 26.99,
342
+ "eval_accuracy": 0.9787341448608582,
343
+ "eval_f1": 0.9691018771893516,
344
+ "eval_loss": 0.09096662700176239,
345
+ "eval_precision": 0.9683967033303716,
346
+ "eval_recall": 0.9698080787935505,
347
+ "eval_runtime": 8.2409,
348
+ "eval_samples_per_second": 838.5,
349
+ "eval_steps_per_second": 3.276,
350
+ "step": 1620
351
+ },
352
+ {
353
+ "epoch": 27.99,
354
+ "eval_accuracy": 0.9786702450557526,
355
+ "eval_f1": 0.9690532771176695,
356
+ "eval_loss": 0.09239726513624191,
357
+ "eval_precision": 0.9684296811558675,
358
+ "eval_recall": 0.9696776766948437,
359
+ "eval_runtime": 8.2167,
360
+ "eval_samples_per_second": 840.968,
361
+ "eval_steps_per_second": 3.286,
362
+ "step": 1680
363
+ },
364
+ {
365
+ "epoch": 28.99,
366
+ "eval_accuracy": 0.9788299945685166,
367
+ "eval_f1": 0.9692778570442306,
368
+ "eval_loss": 0.09497389197349548,
369
+ "eval_precision": 0.9693075990733212,
370
+ "eval_recall": 0.9692481168402804,
371
+ "eval_runtime": 8.3979,
372
+ "eval_samples_per_second": 822.821,
373
+ "eval_steps_per_second": 3.215,
374
+ "step": 1740
375
+ },
376
+ {
377
+ "epoch": 29.99,
378
+ "eval_accuracy": 0.9788811144126011,
379
+ "eval_f1": 0.9694443698883832,
380
+ "eval_loss": 0.09615545719861984,
381
+ "eval_precision": 0.9691805239310932,
382
+ "eval_recall": 0.9697083595415983,
383
+ "eval_runtime": 8.3107,
384
+ "eval_samples_per_second": 831.454,
385
+ "eval_steps_per_second": 3.249,
386
+ "step": 1800
387
+ },
388
+ {
389
+ "epoch": 30.99,
390
+ "eval_accuracy": 0.9787213648998371,
391
+ "eval_f1": 0.9692713982912798,
392
+ "eval_loss": 0.09773550182580948,
393
+ "eval_precision": 0.9686587860355012,
394
+ "eval_recall": 0.9698847859104368,
395
+ "eval_runtime": 8.3495,
396
+ "eval_samples_per_second": 827.598,
397
+ "eval_steps_per_second": 3.234,
398
+ "step": 1860
399
+ },
400
+ {
401
+ "epoch": 31.99,
402
+ "eval_accuracy": 0.9788491645100482,
403
+ "eval_f1": 0.9693524335969396,
404
+ "eval_loss": 0.09792140126228333,
405
+ "eval_precision": 0.9688436281158288,
406
+ "eval_recall": 0.9698617737753709,
407
+ "eval_runtime": 8.217,
408
+ "eval_samples_per_second": 840.944,
409
+ "eval_steps_per_second": 3.286,
410
+ "step": 1920
411
+ },
412
+ {
413
+ "epoch": 32.99,
414
+ "eval_accuracy": 0.9787916546854533,
415
+ "eval_f1": 0.9692388483638134,
416
+ "eval_loss": 0.09997569024562836,
417
+ "eval_precision": 0.9686855916944412,
418
+ "eval_recall": 0.9697927373701732,
419
+ "eval_runtime": 9.2731,
420
+ "eval_samples_per_second": 745.164,
421
+ "eval_steps_per_second": 2.912,
422
+ "step": 1980
423
+ },
424
+ {
425
+ "epoch": 33.33,
426
+ "learning_rate": 1.0526315789473684e-05,
427
+ "loss": 0.018,
428
+ "step": 2000
429
+ },
430
+ {
431
+ "epoch": 33.99,
432
+ "eval_accuracy": 0.9788427745295377,
433
+ "eval_f1": 0.9692880908937579,
434
+ "eval_loss": 0.10211524367332458,
435
+ "eval_precision": 0.9687533522335453,
436
+ "eval_recall": 0.9698234202169277,
437
+ "eval_runtime": 8.1926,
438
+ "eval_samples_per_second": 843.447,
439
+ "eval_steps_per_second": 3.296,
440
+ "step": 2040
441
+ },
442
+ {
443
+ "epoch": 34.99,
444
+ "eval_accuracy": 0.9788427745295377,
445
+ "eval_f1": 0.9694191594963878,
446
+ "eval_loss": 0.10369361937046051,
447
+ "eval_precision": 0.968739706929965,
448
+ "eval_recall": 0.9700995658377184,
449
+ "eval_runtime": 8.1647,
450
+ "eval_samples_per_second": 846.325,
451
+ "eval_steps_per_second": 3.307,
452
+ "step": 2100
453
+ },
454
+ {
455
+ "epoch": 35.99,
456
+ "eval_accuracy": 0.9789514041982172,
457
+ "eval_f1": 0.9695659672319632,
458
+ "eval_loss": 0.10349933803081512,
459
+ "eval_precision": 0.9688493324856962,
460
+ "eval_recall": 0.9702836629182455,
461
+ "eval_runtime": 8.3265,
462
+ "eval_samples_per_second": 829.884,
463
+ "eval_steps_per_second": 3.243,
464
+ "step": 2160
465
+ },
466
+ {
467
+ "epoch": 36.99,
468
+ "eval_accuracy": 0.9788875043931116,
469
+ "eval_f1": 0.9694168151938519,
470
+ "eval_loss": 0.10418598353862762,
471
+ "eval_precision": 0.9688115284726,
472
+ "eval_recall": 0.9700228587208322,
473
+ "eval_runtime": 8.3843,
474
+ "eval_samples_per_second": 824.159,
475
+ "eval_steps_per_second": 3.22,
476
+ "step": 2220
477
+ },
478
+ {
479
+ "epoch": 37.99,
480
+ "eval_accuracy": 0.9787405348413687,
481
+ "eval_f1": 0.9692040580887735,
482
+ "eval_loss": 0.10528801381587982,
483
+ "eval_precision": 0.9685395840514766,
484
+ "eval_recall": 0.9698694444870595,
485
+ "eval_runtime": 9.1629,
486
+ "eval_samples_per_second": 754.13,
487
+ "eval_steps_per_second": 2.947,
488
+ "step": 2280
489
+ },
490
+ {
491
+ "epoch": 38.99,
492
+ "eval_accuracy": 0.97886833445158,
493
+ "eval_f1": 0.969450960550726,
494
+ "eval_loss": 0.10520931333303452,
495
+ "eval_precision": 0.9688567794922085,
496
+ "eval_recall": 0.970045870855898,
497
+ "eval_runtime": 8.2422,
498
+ "eval_samples_per_second": 838.371,
499
+ "eval_steps_per_second": 3.276,
500
+ "step": 2340
501
+ },
502
+ {
503
+ "epoch": 39.99,
504
+ "eval_accuracy": 0.9788491645100482,
505
+ "eval_f1": 0.9694007796419167,
506
+ "eval_loss": 0.1054077297449112,
507
+ "eval_precision": 0.9688177562575179,
508
+ "eval_recall": 0.969984505162389,
509
+ "eval_runtime": 8.347,
510
+ "eval_samples_per_second": 827.841,
511
+ "eval_steps_per_second": 3.235,
512
+ "step": 2400
513
+ },
514
+ {
515
+ "epoch": 39.99,
516
+ "step": 2400,
517
+ "total_flos": 1.1777248744118362e+17,
518
+ "train_loss": 0.17379826227823894,
519
+ "train_runtime": 1699.1945,
520
+ "train_samples_per_second": 1463.964,
521
+ "train_steps_per_second": 1.412
522
+ }
523
+ ],
524
+ "max_steps": 2400,
525
+ "num_train_epochs": 40,
526
+ "total_flos": 1.1777248744118362e+17,
527
+ "trial_name": null,
528
+ "trial_params": null
529
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8fe0ed38e74a4d194378331bfb9493c837d3b3f57d63a2deb0d0dcb374d8006
3
+ size 3439
vocab.json ADDED
The diff for this file is too large to render. See raw diff