wissamantoun commited on
Commit
c84ff4e
1 Parent(s): 1ab3957

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: mit
4
+ tags:
5
+ - roberta
6
+ - token-classification
7
+ base_model: almanach/camembertv2-base
8
+ datasets:
9
+ - Sequoia
10
+ metrics:
11
+ - las
12
+ - upos
13
+ model-index:
14
+ - name: almanach/camembertv2-base-sequoia
15
+ results:
16
+ - task:
17
+ type: token-classification
18
+ name: Part-of-Speech Tagging
19
+ dataset:
20
+ type: Sequoia
21
+ name: Sequoia
22
+ metrics:
23
+ - name: upos
24
+ type: upos
25
+ value: 0.99383
26
+ verified: false
27
+ - task:
28
+ type: token-classification
29
+ name: Dependency Parsing
30
+ dataset:
31
+ type: Sequoia
32
+ name: Sequoia
33
+ metrics:
34
+ - name: las
35
+ type: las
36
+ value: 0.94942
37
+ verified: false
38
+ ---
39
+
40
+ # Model Card for almanach/camembertv2-base-sequoia
41
+
42
+ almanach/camembertv2-base-sequoia is a roberta model for token classification. It is trained on the Sequoia dataset for the task of Part-of-Speech Tagging and Dependency Parsing.
43
+ The model achieves an f1 score of on the Sequoia dataset.
44
+
45
+ The model is part of the almanach/camembertv2-base family of model finetunes.
46
+
47
+ ## Model Details
48
+
49
+ ### Model Description
50
+
51
+ - **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
52
+ - **Model type:** roberta
53
+ - **Language(s) (NLP):** French
54
+ - **License:** MIT
55
+ - **Finetuned from model :** almanach/camembertv2-base
56
+
57
+ ### Model Sources
58
+
59
+ <!-- Provide the basic links for the model. -->
60
+
61
+ - **Repository:** https://github.com/WissamAntoun/camemberta
62
+ - **Paper:** https://arxiv.org/abs/2411.08868
63
+
64
+ ## Uses
65
+
66
+ The model can be used for token classification tasks in French for Part-of-Speech Tagging and Dependency Parsing.
67
+
68
+ ## Bias, Risks, and Limitations
69
+
70
+ The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
71
+
72
+
73
+ ## How to Get Started with the Model
74
+
75
+ You can use the models directly with the hopsparser library in server mode https://github.com/hopsparser/hopsparser/blob/main/docs/server.md
76
+
77
+
78
+ ## Training Details
79
+
80
+ ### Training Procedure
81
+
82
+ Model trained with the [hopsparser](https://github.com/hopsparser/hopsparser) library on the Sequoia dataset.
83
+
84
+
85
+ #### Training Hyperparameters
86
+
87
+ ```yml
88
+ # Layer dimensions
89
+ mlp_input: 1024
90
+ mlp_tag_hidden: 16
91
+ mlp_arc_hidden: 512
92
+ mlp_lab_hidden: 128
93
+ # Lexers
94
+ lexers:
95
+ - name: word_embeddings
96
+ type: words
97
+ embedding_size: 256
98
+ word_dropout: 0.5
99
+ - name: char_level_embeddings
100
+ type: chars_rnn
101
+ embedding_size: 64
102
+ lstm_output_size: 128
103
+ - name: fasttext
104
+ type: fasttext
105
+ - name: camembertv2_base_p2_17k_last_layer
106
+ type: bert
107
+ model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
108
+ layers: [11]
109
+ subwords_reduction: "mean"
110
+ # Training hyperparameters
111
+ encoder_dropout: 0.5
112
+ mlp_dropout: 0.5
113
+ batch_size: 8
114
+ epochs: 64
115
+ lr:
116
+ base: 0.00003
117
+ schedule:
118
+ shape: linear
119
+ warmup_steps: 100
120
+
121
+ ```
122
+
123
+ #### Results
124
+
125
+ **UPOS:** 0.99383
126
+ **LAS:** 0.94942
127
+
128
+ ## Technical Specifications
129
+
130
+ ### Model Architecture and Objective
131
+
132
+ roberta custom model for token classification.
133
+
134
+ ## Citation
135
+
136
+ **BibTeX:**
137
+
138
+ ```bibtex
139
+ @misc{antoun2024camembert20smarterfrench,
140
+ title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
141
+ author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
142
+ year={2024},
143
+ eprint={2411.08868},
144
+ archivePrefix={arXiv},
145
+ primaryClass={cs.CL},
146
+ url={https://arxiv.org/abs/2411.08868},
147
+ }
148
+
149
+ @inproceedings{grobol:hal-03223424,
150
+ title = {Analyse en dépendances du français avec des plongements contextualisés},
151
+ author = {Grobol, Loïc and Crabbé, Benoît},
152
+ url = {https://hal.archives-ouvertes.fr/hal-03223424},
153
+ booktitle = {Actes de la 28ème Conférence sur le Traitement Automatique des Langues Naturelles},
154
+ eventtitle = {TALN-RÉCITAL 2021},
155
+ venue = {Lille, France},
156
+ pdf = {https://hal.archives-ouvertes.fr/hal-03223424/file/HOPS_final.pdf},
157
+ hal_id = {hal-03223424},
158
+ hal_version = {v1},
159
+ }
160
+
161
+ ```
camembertv2_base_p2_17k_last_layer.yaml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Layer dimensions
2
+ mlp_input: 1024
3
+ mlp_tag_hidden: 16
4
+ mlp_arc_hidden: 512
5
+ mlp_lab_hidden: 128
6
+ # Lexers
7
+ lexers:
8
+ - name: word_embeddings
9
+ type: words
10
+ embedding_size: 256
11
+ word_dropout: 0.5
12
+ - name: char_level_embeddings
13
+ type: chars_rnn
14
+ embedding_size: 64
15
+ lstm_output_size: 128
16
+ - name: fasttext
17
+ type: fasttext
18
+ - name: camembertv2_base_p2_17k_last_layer
19
+ type: bert
20
+ model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
21
+ layers: [11]
22
+ subwords_reduction: "mean"
23
+ # Training hyperparameters
24
+ encoder_dropout: 0.5
25
+ mlp_dropout: 0.5
26
+ batch_size: 8
27
+ epochs: 64
28
+ lr:
29
+ base: 0.00003
30
+ schedule:
31
+ shape: linear
32
+ warmup_steps: 100
fr_sequoia-ud-dev.parsed.conllu ADDED
The diff for this file is too large to render. See raw diff
 
fr_sequoia-ud-test.parsed.conllu ADDED
The diff for this file is too large to render. See raw diff
 
model/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mlp_input": 1024, "mlp_tag_hidden": 16, "mlp_arc_hidden": 512, "mlp_lab_hidden": 128, "biased_biaffine": true, "default_batch_size": 8, "encoder_dropout": 0.5, "extra_annotations": {}, "labels": ["acl", "acl:relcl", "advcl", "advcl:cleft", "advmod", "amod", "appos", "aux:caus", "aux:pass", "aux:tense", "case", "cc", "ccomp", "conj", "cop", "csubj", "csubj:pass", "dep", "det", "discourse", "dislocated", "expl:comp", "expl:pass", "expl:subj", "fixed", "flat:foreign", "flat:name", "goeswith", "iobj", "iobj:agent", "mark", "nmod", "nsubj", "nsubj:caus", "nsubj:pass", "nummod", "obj", "obj:agent", "obl:agent", "obl:arg", "obl:mod", "orphan", "parataxis", "punct", "root", "vocative", "xcomp"], "mlp_dropout": 0.5, "tagset": ["ADJ", "ADP", "ADV", "AUX", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X"], "lexers": {"word_embeddings": "words", "char_level_embeddings": "chars_rnn", "fasttext": "fasttext", "camembertv2_base_p2_17k_last_layer": "bert"}, "multitask_loss": "sum"}
model/lexers/camembertv2_base_p2_17k_last_layer/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"layers": [11], "subwords_reduction": "mean", "weight_layers": false}
model/lexers/camembertv2_base_p2_17k_last_layer/model/config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/",
3
+ "architectures": [
4
+ "RobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 1,
8
+ "classifier_dropout": null,
9
+ "embedding_size": 768,
10
+ "eos_token_id": 2,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-07,
17
+ "max_position_embeddings": 1025,
18
+ "model_name": "camembertv2-base-bf16",
19
+ "model_type": "roberta",
20
+ "num_attention_heads": 12,
21
+ "num_hidden_layers": 12,
22
+ "pad_token_id": 0,
23
+ "position_biased_input": true,
24
+ "position_embedding_type": "absolute",
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.44.2",
27
+ "type_vocab_size": 1,
28
+ "use_cache": true,
29
+ "vocab_size": 32768
30
+ }
model/lexers/camembertv2_base_p2_17k_last_layer/model/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "[CLS]",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "[CLS]",
48
+ "eos_token": "[SEP]",
49
+ "errors": "replace",
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "[UNK]"
57
+ }
model/lexers/char_level_embeddings/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"char_embeddings_dim": 64, "output_dim": 128, "special_tokens": ["<root>"], "charset": ["<pad>", "<special>", " ", "!", "\"", "$", "%", "&", "'", "(", ")", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", "?", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "]", "^", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "\u00a9", "\u00b0", "\u00b1", "\u00bd", "\u00c0", "\u00c9", "\u00ce", "\u00df", "\u00e0", "\u00e1", "\u00e2", "\u00e4", "\u00e7", "\u00e8", "\u00e9", "\u00ea", "\u00eb", "\u00ee", "\u00ef", "\u00f3", "\u00f4", "\u00f6", "\u00f9", "\u00fb"]}
model/lexers/fasttext/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"special_tokens": ["<root>"]}
model/lexers/fasttext/fasttext_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7b6536421c699578261983cca399f5da32aac7a6d5c5aac302ec6ded91e8f52
3
+ size 801050258
model/lexers/word_embeddings/config.json ADDED
The diff for this file is too large to render. See raw diff
 
model/weights.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c024e57f6020b583d6ea9018b7e63d9bb07fbc7e5dbf6a8b8a4181433a857ede
3
+ size 1749690986
train.log ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [hops] 2024-09-24 17:10:50.083 | INFO | Initializing a parser from /workspace/configs/exp_camembertv2/camembertv2_base_p2_17k_last_layer.yaml
2
+ [hops] 2024-09-24 17:10:50.136 | INFO | Generating a FastText model from the treebank
3
+ [hops] 2024-09-24 17:10:50.154 | INFO | Training fasttext model
4
+ [hops] 2024-09-24 17:10:51.461 | WARNING | Some weights of RobertaModel were not initialized from the model checkpoint at /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
5
+ You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
6
+ [hops] 2024-09-24 17:10:57.872 | INFO | Start training on cuda:3
7
+ [hops] 2024-09-24 17:10:57.876 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
8
+ [hops] 2024-09-24 17:11:12.533 | INFO | Epoch 0: train loss 2.8127 dev loss 2.0313 dev tag acc 38.38% dev head acc 26.47% dev deprel acc 52.06%
9
+ [hops] 2024-09-24 17:11:12.534 | INFO | New best model: head accuracy 26.47% > 0.00%
10
+ [hops] 2024-09-24 17:11:30.203 | INFO | Epoch 1: train loss 1.6494 dev loss 1.1043 dev tag acc 67.79% dev head acc 58.24% dev deprel acc 77.43%
11
+ [hops] 2024-09-24 17:11:30.203 | INFO | New best model: head accuracy 58.24% > 26.47%
12
+ [hops] 2024-09-24 17:11:47.950 | INFO | Epoch 2: train loss 1.0207 dev loss 0.6941 dev tag acc 79.40% dev head acc 77.31% dev deprel acc 84.67%
13
+ [hops] 2024-09-24 17:11:47.951 | INFO | New best model: head accuracy 77.31% > 58.24%
14
+ [hops] 2024-09-24 17:12:04.965 | INFO | Epoch 3: train loss 0.7013 dev loss 0.4977 dev tag acc 87.37% dev head acc 83.79% dev deprel acc 89.06%
15
+ [hops] 2024-09-24 17:12:04.966 | INFO | New best model: head accuracy 83.79% > 77.31%
16
+ [hops] 2024-09-24 17:12:21.325 | INFO | Epoch 4: train loss 0.5227 dev loss 0.3940 dev tag acc 90.34% dev head acc 85.27% dev deprel acc 91.90%
17
+ [hops] 2024-09-24 17:12:21.326 | INFO | New best model: head accuracy 85.27% > 83.79%
18
+ [hops] 2024-09-24 17:12:38.042 | INFO | Epoch 5: train loss 0.4107 dev loss 0.3263 dev tag acc 93.43% dev head acc 87.56% dev deprel acc 93.04%
19
+ [hops] 2024-09-24 17:12:38.043 | INFO | New best model: head accuracy 87.56% > 85.27%
20
+ [hops] 2024-09-24 17:12:55.903 | INFO | Epoch 6: train loss 0.3357 dev loss 0.2794 dev tag acc 95.46% dev head acc 88.97% dev deprel acc 93.72%
21
+ [hops] 2024-09-24 17:12:55.904 | INFO | New best model: head accuracy 88.97% > 87.56%
22
+ [hops] 2024-09-24 17:13:13.763 | INFO | Epoch 7: train loss 0.2768 dev loss 0.2410 dev tag acc 96.72% dev head acc 90.81% dev deprel acc 94.62%
23
+ [hops] 2024-09-24 17:13:13.764 | INFO | New best model: head accuracy 90.81% > 88.97%
24
+ [hops] 2024-09-24 17:13:31.249 | INFO | Epoch 8: train loss 0.2339 dev loss 0.2229 dev tag acc 97.78% dev head acc 91.26% dev deprel acc 95.34%
25
+ [hops] 2024-09-24 17:13:31.250 | INFO | New best model: head accuracy 91.26% > 90.81%
26
+ [hops] 2024-09-24 17:13:48.426 | INFO | Epoch 9: train loss 0.1999 dev loss 0.2153 dev tag acc 98.06% dev head acc 91.40% dev deprel acc 95.68%
27
+ [hops] 2024-09-24 17:13:48.427 | INFO | New best model: head accuracy 91.40% > 91.26%
28
+ [hops] 2024-09-24 17:14:05.318 | INFO | Epoch 10: train loss 0.1755 dev loss 0.2009 dev tag acc 98.31% dev head acc 92.71% dev deprel acc 96.05%
29
+ [hops] 2024-09-24 17:14:05.320 | INFO | New best model: head accuracy 92.71% > 91.40%
30
+ [hops] 2024-09-24 17:14:22.152 | INFO | Epoch 11: train loss 0.1523 dev loss 0.1893 dev tag acc 98.35% dev head acc 93.49% dev deprel acc 96.03%
31
+ [hops] 2024-09-24 17:14:22.153 | INFO | New best model: head accuracy 93.49% > 92.71%
32
+ [hops] 2024-09-24 17:14:39.729 | INFO | Epoch 12: train loss 0.1368 dev loss 0.1846 dev tag acc 98.52% dev head acc 93.65% dev deprel acc 96.35%
33
+ [hops] 2024-09-24 17:14:39.730 | INFO | New best model: head accuracy 93.65% > 93.49%
34
+ [hops] 2024-09-24 17:14:57.785 | INFO | Epoch 13: train loss 0.1220 dev loss 0.2027 dev tag acc 98.63% dev head acc 93.91% dev deprel acc 96.43%
35
+ [hops] 2024-09-24 17:14:57.786 | INFO | New best model: head accuracy 93.91% > 93.65%
36
+ [hops] 2024-09-24 17:15:15.740 | INFO | Epoch 14: train loss 0.1122 dev loss 0.1918 dev tag acc 98.79% dev head acc 94.04% dev deprel acc 96.89%
37
+ [hops] 2024-09-24 17:15:15.741 | INFO | New best model: head accuracy 94.04% > 93.91%
38
+ [hops] 2024-09-24 17:15:33.756 | INFO | Epoch 15: train loss 0.1010 dev loss 0.1851 dev tag acc 98.90% dev head acc 93.87% dev deprel acc 97.01%
39
+ [hops] 2024-09-24 17:15:49.185 | INFO | Epoch 16: train loss 0.0943 dev loss 0.1964 dev tag acc 98.97% dev head acc 94.26% dev deprel acc 97.05%
40
+ [hops] 2024-09-24 17:15:49.186 | INFO | New best model: head accuracy 94.26% > 94.04%
41
+ [hops] 2024-09-24 17:16:05.715 | INFO | Epoch 17: train loss 0.0822 dev loss 0.1925 dev tag acc 98.84% dev head acc 95.16% dev deprel acc 97.18%
42
+ [hops] 2024-09-24 17:16:05.716 | INFO | New best model: head accuracy 95.16% > 94.26%
43
+ [hops] 2024-09-24 17:16:23.327 | INFO | Epoch 18: train loss 0.0783 dev loss 0.1929 dev tag acc 99.02% dev head acc 94.98% dev deprel acc 97.26%
44
+ [hops] 2024-09-24 17:16:38.640 | INFO | Epoch 19: train loss 0.0720 dev loss 0.1976 dev tag acc 99.09% dev head acc 95.05% dev deprel acc 97.11%
45
+ [hops] 2024-09-24 17:16:53.616 | INFO | Epoch 20: train loss 0.0644 dev loss 0.1988 dev tag acc 99.10% dev head acc 95.09% dev deprel acc 97.28%
46
+ [hops] 2024-09-24 17:17:09.018 | INFO | Epoch 21: train loss 0.0609 dev loss 0.2084 dev tag acc 99.14% dev head acc 95.39% dev deprel acc 97.28%
47
+ [hops] 2024-09-24 17:17:09.019 | INFO | New best model: head accuracy 95.39% > 95.16%
48
+ [hops] 2024-09-24 17:17:26.308 | INFO | Epoch 22: train loss 0.0585 dev loss 0.2076 dev tag acc 99.19% dev head acc 95.35% dev deprel acc 97.58%
49
+ [hops] 2024-09-24 17:17:41.545 | INFO | Epoch 23: train loss 0.0545 dev loss 0.2094 dev tag acc 99.15% dev head acc 95.29% dev deprel acc 97.49%
50
+ [hops] 2024-09-24 17:17:56.873 | INFO | Epoch 24: train loss 0.0502 dev loss 0.2116 dev tag acc 99.17% dev head acc 95.23% dev deprel acc 97.49%
51
+ [hops] 2024-09-24 17:18:12.142 | INFO | Epoch 25: train loss 0.0455 dev loss 0.2059 dev tag acc 99.23% dev head acc 95.44% dev deprel acc 97.55%
52
+ [hops] 2024-09-24 17:18:12.143 | INFO | New best model: head accuracy 95.44% > 95.39%
53
+ [hops] 2024-09-24 17:18:29.698 | INFO | Epoch 26: train loss 0.0436 dev loss 0.2258 dev tag acc 99.22% dev head acc 95.46% dev deprel acc 97.44%
54
+ [hops] 2024-09-24 17:18:29.699 | INFO | New best model: head accuracy 95.46% > 95.44%
55
+ [hops] 2024-09-24 17:18:45.933 | INFO | Epoch 27: train loss 0.0404 dev loss 0.2359 dev tag acc 99.20% dev head acc 95.56% dev deprel acc 97.47%
56
+ [hops] 2024-09-24 17:18:45.934 | INFO | New best model: head accuracy 95.56% > 95.46%
57
+ [hops] 2024-09-24 17:19:02.822 | INFO | Epoch 28: train loss 0.0376 dev loss 0.2342 dev tag acc 99.21% dev head acc 95.75% dev deprel acc 97.56%
58
+ [hops] 2024-09-24 17:19:02.823 | INFO | New best model: head accuracy 95.75% > 95.56%
59
+ [hops] 2024-09-24 17:19:20.495 | INFO | Epoch 29: train loss 0.0365 dev loss 0.2271 dev tag acc 99.21% dev head acc 95.69% dev deprel acc 97.61%
60
+ [hops] 2024-09-24 17:19:34.954 | INFO | Epoch 30: train loss 0.0349 dev loss 0.2359 dev tag acc 99.21% dev head acc 95.79% dev deprel acc 97.60%
61
+ [hops] 2024-09-24 17:19:34.955 | INFO | New best model: head accuracy 95.79% > 95.75%
62
+ [hops] 2024-09-24 17:19:52.083 | INFO | Epoch 31: train loss 0.0333 dev loss 0.2284 dev tag acc 99.22% dev head acc 95.68% dev deprel acc 97.60%
63
+ [hops] 2024-09-24 17:20:07.114 | INFO | Epoch 32: train loss 0.0302 dev loss 0.2329 dev tag acc 99.23% dev head acc 95.50% dev deprel acc 97.64%
64
+ [hops] 2024-09-24 17:20:20.909 | INFO | Epoch 33: train loss 0.0280 dev loss 0.2253 dev tag acc 99.27% dev head acc 95.56% dev deprel acc 97.70%
65
+ [hops] 2024-09-24 17:20:36.250 | INFO | Epoch 34: train loss 0.0269 dev loss 0.2490 dev tag acc 99.20% dev head acc 95.74% dev deprel acc 97.69%
66
+ [hops] 2024-09-24 17:20:51.100 | INFO | Epoch 35: train loss 0.0266 dev loss 0.2576 dev tag acc 99.21% dev head acc 95.74% dev deprel acc 97.78%
67
+ [hops] 2024-09-24 17:21:04.672 | INFO | Epoch 36: train loss 0.0255 dev loss 0.2525 dev tag acc 99.31% dev head acc 95.81% dev deprel acc 97.75%
68
+ [hops] 2024-09-24 17:21:04.673 | INFO | New best model: head accuracy 95.81% > 95.79%
69
+ [hops] 2024-09-24 17:21:20.986 | INFO | Epoch 37: train loss 0.0226 dev loss 0.2545 dev tag acc 99.29% dev head acc 95.85% dev deprel acc 97.67%
70
+ [hops] 2024-09-24 17:21:20.987 | INFO | New best model: head accuracy 95.85% > 95.81%
71
+ [hops] 2024-09-24 17:21:38.097 | INFO | Epoch 38: train loss 0.0224 dev loss 0.2743 dev tag acc 99.26% dev head acc 95.97% dev deprel acc 97.61%
72
+ [hops] 2024-09-24 17:21:38.098 | INFO | New best model: head accuracy 95.97% > 95.85%
73
+ [hops] 2024-09-24 17:21:54.248 | INFO | Epoch 39: train loss 0.0213 dev loss 0.2854 dev tag acc 99.26% dev head acc 95.75% dev deprel acc 97.66%
74
+ [hops] 2024-09-24 17:22:09.077 | INFO | Epoch 40: train loss 0.0212 dev loss 0.2520 dev tag acc 99.26% dev head acc 95.94% dev deprel acc 97.63%
75
+ [hops] 2024-09-24 17:22:24.533 | INFO | Epoch 41: train loss 0.0198 dev loss 0.2570 dev tag acc 99.31% dev head acc 96.04% dev deprel acc 97.81%
76
+ [hops] 2024-09-24 17:22:24.534 | INFO | New best model: head accuracy 96.04% > 95.97%
77
+ [hops] 2024-09-24 17:22:41.309 | INFO | Epoch 42: train loss 0.0179 dev loss 0.2711 dev tag acc 99.30% dev head acc 95.95% dev deprel acc 97.74%
78
+ [hops] 2024-09-24 17:22:56.619 | INFO | Epoch 43: train loss 0.0166 dev loss 0.2740 dev tag acc 99.27% dev head acc 96.03% dev deprel acc 97.86%
79
+ [hops] 2024-09-24 17:23:11.247 | INFO | Epoch 44: train loss 0.0168 dev loss 0.2802 dev tag acc 99.27% dev head acc 96.07% dev deprel acc 97.83%
80
+ [hops] 2024-09-24 17:23:11.248 | INFO | New best model: head accuracy 96.07% > 96.04%
81
+ [hops] 2024-09-24 17:23:29.041 | INFO | Epoch 45: train loss 0.0163 dev loss 0.2719 dev tag acc 99.28% dev head acc 96.19% dev deprel acc 97.87%
82
+ [hops] 2024-09-24 17:23:29.042 | INFO | New best model: head accuracy 96.19% > 96.07%
83
+ [hops] 2024-09-24 17:23:46.148 | INFO | Epoch 46: train loss 0.0180 dev loss 0.2666 dev tag acc 99.26% dev head acc 96.01% dev deprel acc 97.86%
84
+ [hops] 2024-09-24 17:24:01.336 | INFO | Epoch 47: train loss 0.0142 dev loss 0.2792 dev tag acc 99.29% dev head acc 96.07% dev deprel acc 97.83%
85
+ [hops] 2024-09-24 17:24:16.066 | INFO | Epoch 48: train loss 0.0134 dev loss 0.2820 dev tag acc 99.27% dev head acc 96.06% dev deprel acc 97.79%
86
+ [hops] 2024-09-24 17:24:31.201 | INFO | Epoch 49: train loss 0.0137 dev loss 0.2877 dev tag acc 99.32% dev head acc 96.13% dev deprel acc 97.85%
87
+ [hops] 2024-09-24 17:24:46.077 | INFO | Epoch 50: train loss 0.0130 dev loss 0.2910 dev tag acc 99.28% dev head acc 96.11% dev deprel acc 97.91%
88
+ [hops] 2024-09-24 17:25:01.474 | INFO | Epoch 51: train loss 0.0120 dev loss 0.3076 dev tag acc 99.27% dev head acc 96.06% dev deprel acc 97.86%
89
+ [hops] 2024-09-24 17:25:15.876 | INFO | Epoch 52: train loss 0.0114 dev loss 0.3043 dev tag acc 99.28% dev head acc 96.13% dev deprel acc 97.86%
90
+ [hops] 2024-09-24 17:25:31.219 | INFO | Epoch 53: train loss 0.0113 dev loss 0.3071 dev tag acc 99.26% dev head acc 96.07% dev deprel acc 97.89%
91
+ [hops] 2024-09-24 17:25:46.377 | INFO | Epoch 54: train loss 0.0103 dev loss 0.3065 dev tag acc 99.27% dev head acc 96.25% dev deprel acc 97.94%
92
+ [hops] 2024-09-24 17:25:46.378 | INFO | New best model: head accuracy 96.25% > 96.19%
93
+ [hops] 2024-09-24 17:26:03.659 | INFO | Epoch 55: train loss 0.0104 dev loss 0.3091 dev tag acc 99.27% dev head acc 96.19% dev deprel acc 97.88%
94
+ [hops] 2024-09-24 17:26:18.791 | INFO | Epoch 56: train loss 0.0098 dev loss 0.3122 dev tag acc 99.27% dev head acc 96.05% dev deprel acc 97.84%
95
+ [hops] 2024-09-24 17:26:34.137 | INFO | Epoch 57: train loss 0.0094 dev loss 0.3159 dev tag acc 99.26% dev head acc 96.07% dev deprel acc 97.82%
96
+ [hops] 2024-09-24 17:26:49.249 | INFO | Epoch 58: train loss 0.0094 dev loss 0.3203 dev tag acc 99.28% dev head acc 96.15% dev deprel acc 97.87%
97
+ [hops] 2024-09-24 17:27:04.725 | INFO | Epoch 59: train loss 0.0082 dev loss 0.3228 dev tag acc 99.28% dev head acc 96.17% dev deprel acc 97.92%
98
+ [hops] 2024-09-24 17:27:20.299 | INFO | Epoch 60: train loss 0.0089 dev loss 0.3213 dev tag acc 99.29% dev head acc 96.13% dev deprel acc 97.92%
99
+ [hops] 2024-09-24 17:27:35.759 | INFO | Epoch 61: train loss 0.0082 dev loss 0.3217 dev tag acc 99.29% dev head acc 96.19% dev deprel acc 97.93%
100
+ [hops] 2024-09-24 17:27:51.235 | INFO | Epoch 62: train loss 0.0082 dev loss 0.3231 dev tag acc 99.29% dev head acc 96.24% dev deprel acc 97.91%
101
+ [hops] 2024-09-24 17:28:06.104 | INFO | Epoch 63: train loss 0.0085 dev loss 0.3223 dev tag acc 99.29% dev head acc 96.24% dev deprel acc 97.90%
102
+ [hops] 2024-09-24 17:28:11.404 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
103
+ [hops] 2024-09-24 17:28:16.852 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
104
+ [hops] 2024-09-24 17:28:18.856 | INFO | Metrics for Sequoia-camembertv2_base_p2_17k_last_layer+rand_seed=42
105
+ ───────────────────────────────
106
+ Split UPOS UAS LAS
107
+ ───────────────────────────────
108
+ Dev 99.27 96.30 95.11
109
+ Test 99.38 96.12 94.94
110
+ ───────────────────────────────
111
+