wissamantoun
commited on
Commit
•
c84ff4e
1
Parent(s):
1ab3957
Upload folder using huggingface_hub
Browse files- README.md +161 -0
- camembertv2_base_p2_17k_last_layer.yaml +32 -0
- fr_sequoia-ud-dev.parsed.conllu +0 -0
- fr_sequoia-ud-test.parsed.conllu +0 -0
- model/config.json +1 -0
- model/lexers/camembertv2_base_p2_17k_last_layer/config.json +1 -0
- model/lexers/camembertv2_base_p2_17k_last_layer/model/config.json +30 -0
- model/lexers/camembertv2_base_p2_17k_last_layer/model/special_tokens_map.json +51 -0
- model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer.json +0 -0
- model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer_config.json +57 -0
- model/lexers/char_level_embeddings/config.json +1 -0
- model/lexers/fasttext/config.json +1 -0
- model/lexers/fasttext/fasttext_model.bin +3 -0
- model/lexers/word_embeddings/config.json +0 -0
- model/weights.pt +3 -0
- train.log +111 -0
README.md
ADDED
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: fr
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- roberta
|
6 |
+
- token-classification
|
7 |
+
base_model: almanach/camembertv2-base
|
8 |
+
datasets:
|
9 |
+
- Sequoia
|
10 |
+
metrics:
|
11 |
+
- las
|
12 |
+
- upos
|
13 |
+
model-index:
|
14 |
+
- name: almanach/camembertv2-base-sequoia
|
15 |
+
results:
|
16 |
+
- task:
|
17 |
+
type: token-classification
|
18 |
+
name: Part-of-Speech Tagging
|
19 |
+
dataset:
|
20 |
+
type: Sequoia
|
21 |
+
name: Sequoia
|
22 |
+
metrics:
|
23 |
+
- name: upos
|
24 |
+
type: upos
|
25 |
+
value: 0.99383
|
26 |
+
verified: false
|
27 |
+
- task:
|
28 |
+
type: token-classification
|
29 |
+
name: Dependency Parsing
|
30 |
+
dataset:
|
31 |
+
type: Sequoia
|
32 |
+
name: Sequoia
|
33 |
+
metrics:
|
34 |
+
- name: las
|
35 |
+
type: las
|
36 |
+
value: 0.94942
|
37 |
+
verified: false
|
38 |
+
---
|
39 |
+
|
40 |
+
# Model Card for almanach/camembertv2-base-sequoia
|
41 |
+
|
42 |
+
almanach/camembertv2-base-sequoia is a roberta model for token classification. It is trained on the Sequoia dataset for the task of Part-of-Speech Tagging and Dependency Parsing.
|
43 |
+
The model achieves an f1 score of on the Sequoia dataset.
|
44 |
+
|
45 |
+
The model is part of the almanach/camembertv2-base family of model finetunes.
|
46 |
+
|
47 |
+
## Model Details
|
48 |
+
|
49 |
+
### Model Description
|
50 |
+
|
51 |
+
- **Developed by:** Wissam Antoun (Phd Student at Almanach, Inria-Paris)
|
52 |
+
- **Model type:** roberta
|
53 |
+
- **Language(s) (NLP):** French
|
54 |
+
- **License:** MIT
|
55 |
+
- **Finetuned from model :** almanach/camembertv2-base
|
56 |
+
|
57 |
+
### Model Sources
|
58 |
+
|
59 |
+
<!-- Provide the basic links for the model. -->
|
60 |
+
|
61 |
+
- **Repository:** https://github.com/WissamAntoun/camemberta
|
62 |
+
- **Paper:** https://arxiv.org/abs/2411.08868
|
63 |
+
|
64 |
+
## Uses
|
65 |
+
|
66 |
+
The model can be used for token classification tasks in French for Part-of-Speech Tagging and Dependency Parsing.
|
67 |
+
|
68 |
+
## Bias, Risks, and Limitations
|
69 |
+
|
70 |
+
The model may exhibit biases based on the training data. The model may not generalize well to other datasets or tasks. The model may also have limitations in terms of the data it was trained on.
|
71 |
+
|
72 |
+
|
73 |
+
## How to Get Started with the Model
|
74 |
+
|
75 |
+
You can use the models directly with the hopsparser library in server mode https://github.com/hopsparser/hopsparser/blob/main/docs/server.md
|
76 |
+
|
77 |
+
|
78 |
+
## Training Details
|
79 |
+
|
80 |
+
### Training Procedure
|
81 |
+
|
82 |
+
Model trained with the [hopsparser](https://github.com/hopsparser/hopsparser) library on the Sequoia dataset.
|
83 |
+
|
84 |
+
|
85 |
+
#### Training Hyperparameters
|
86 |
+
|
87 |
+
```yml
|
88 |
+
# Layer dimensions
|
89 |
+
mlp_input: 1024
|
90 |
+
mlp_tag_hidden: 16
|
91 |
+
mlp_arc_hidden: 512
|
92 |
+
mlp_lab_hidden: 128
|
93 |
+
# Lexers
|
94 |
+
lexers:
|
95 |
+
- name: word_embeddings
|
96 |
+
type: words
|
97 |
+
embedding_size: 256
|
98 |
+
word_dropout: 0.5
|
99 |
+
- name: char_level_embeddings
|
100 |
+
type: chars_rnn
|
101 |
+
embedding_size: 64
|
102 |
+
lstm_output_size: 128
|
103 |
+
- name: fasttext
|
104 |
+
type: fasttext
|
105 |
+
- name: camembertv2_base_p2_17k_last_layer
|
106 |
+
type: bert
|
107 |
+
model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
|
108 |
+
layers: [11]
|
109 |
+
subwords_reduction: "mean"
|
110 |
+
# Training hyperparameters
|
111 |
+
encoder_dropout: 0.5
|
112 |
+
mlp_dropout: 0.5
|
113 |
+
batch_size: 8
|
114 |
+
epochs: 64
|
115 |
+
lr:
|
116 |
+
base: 0.00003
|
117 |
+
schedule:
|
118 |
+
shape: linear
|
119 |
+
warmup_steps: 100
|
120 |
+
|
121 |
+
```
|
122 |
+
|
123 |
+
#### Results
|
124 |
+
|
125 |
+
**UPOS:** 0.99383
|
126 |
+
**LAS:** 0.94942
|
127 |
+
|
128 |
+
## Technical Specifications
|
129 |
+
|
130 |
+
### Model Architecture and Objective
|
131 |
+
|
132 |
+
roberta custom model for token classification.
|
133 |
+
|
134 |
+
## Citation
|
135 |
+
|
136 |
+
**BibTeX:**
|
137 |
+
|
138 |
+
```bibtex
|
139 |
+
@misc{antoun2024camembert20smarterfrench,
|
140 |
+
title={CamemBERT 2.0: A Smarter French Language Model Aged to Perfection},
|
141 |
+
author={Wissam Antoun and Francis Kulumba and Rian Touchent and Éric de la Clergerie and Benoît Sagot and Djamé Seddah},
|
142 |
+
year={2024},
|
143 |
+
eprint={2411.08868},
|
144 |
+
archivePrefix={arXiv},
|
145 |
+
primaryClass={cs.CL},
|
146 |
+
url={https://arxiv.org/abs/2411.08868},
|
147 |
+
}
|
148 |
+
|
149 |
+
@inproceedings{grobol:hal-03223424,
|
150 |
+
title = {Analyse en dépendances du français avec des plongements contextualisés},
|
151 |
+
author = {Grobol, Loïc and Crabbé, Benoît},
|
152 |
+
url = {https://hal.archives-ouvertes.fr/hal-03223424},
|
153 |
+
booktitle = {Actes de la 28ème Conférence sur le Traitement Automatique des Langues Naturelles},
|
154 |
+
eventtitle = {TALN-RÉCITAL 2021},
|
155 |
+
venue = {Lille, France},
|
156 |
+
pdf = {https://hal.archives-ouvertes.fr/hal-03223424/file/HOPS_final.pdf},
|
157 |
+
hal_id = {hal-03223424},
|
158 |
+
hal_version = {v1},
|
159 |
+
}
|
160 |
+
|
161 |
+
```
|
camembertv2_base_p2_17k_last_layer.yaml
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Layer dimensions
|
2 |
+
mlp_input: 1024
|
3 |
+
mlp_tag_hidden: 16
|
4 |
+
mlp_arc_hidden: 512
|
5 |
+
mlp_lab_hidden: 128
|
6 |
+
# Lexers
|
7 |
+
lexers:
|
8 |
+
- name: word_embeddings
|
9 |
+
type: words
|
10 |
+
embedding_size: 256
|
11 |
+
word_dropout: 0.5
|
12 |
+
- name: char_level_embeddings
|
13 |
+
type: chars_rnn
|
14 |
+
embedding_size: 64
|
15 |
+
lstm_output_size: 128
|
16 |
+
- name: fasttext
|
17 |
+
type: fasttext
|
18 |
+
- name: camembertv2_base_p2_17k_last_layer
|
19 |
+
type: bert
|
20 |
+
model: /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/
|
21 |
+
layers: [11]
|
22 |
+
subwords_reduction: "mean"
|
23 |
+
# Training hyperparameters
|
24 |
+
encoder_dropout: 0.5
|
25 |
+
mlp_dropout: 0.5
|
26 |
+
batch_size: 8
|
27 |
+
epochs: 64
|
28 |
+
lr:
|
29 |
+
base: 0.00003
|
30 |
+
schedule:
|
31 |
+
shape: linear
|
32 |
+
warmup_steps: 100
|
fr_sequoia-ud-dev.parsed.conllu
ADDED
The diff for this file is too large to render.
See raw diff
|
|
fr_sequoia-ud-test.parsed.conllu
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model/config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"mlp_input": 1024, "mlp_tag_hidden": 16, "mlp_arc_hidden": 512, "mlp_lab_hidden": 128, "biased_biaffine": true, "default_batch_size": 8, "encoder_dropout": 0.5, "extra_annotations": {}, "labels": ["acl", "acl:relcl", "advcl", "advcl:cleft", "advmod", "amod", "appos", "aux:caus", "aux:pass", "aux:tense", "case", "cc", "ccomp", "conj", "cop", "csubj", "csubj:pass", "dep", "det", "discourse", "dislocated", "expl:comp", "expl:pass", "expl:subj", "fixed", "flat:foreign", "flat:name", "goeswith", "iobj", "iobj:agent", "mark", "nmod", "nsubj", "nsubj:caus", "nsubj:pass", "nummod", "obj", "obj:agent", "obl:agent", "obl:arg", "obl:mod", "orphan", "parataxis", "punct", "root", "vocative", "xcomp"], "mlp_dropout": 0.5, "tagset": ["ADJ", "ADP", "ADV", "AUX", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X"], "lexers": {"word_embeddings": "words", "char_level_embeddings": "chars_rnn", "fasttext": "fasttext", "camembertv2_base_p2_17k_last_layer": "bert"}, "multitask_loss": "sum"}
|
model/lexers/camembertv2_base_p2_17k_last_layer/config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"layers": [11], "subwords_reduction": "mean", "weight_layers": false}
|
model/lexers/camembertv2_base_p2_17k_last_layer/model/config.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/",
|
3 |
+
"architectures": [
|
4 |
+
"RobertaForMaskedLM"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"bos_token_id": 1,
|
8 |
+
"classifier_dropout": null,
|
9 |
+
"embedding_size": 768,
|
10 |
+
"eos_token_id": 2,
|
11 |
+
"hidden_act": "gelu",
|
12 |
+
"hidden_dropout_prob": 0.1,
|
13 |
+
"hidden_size": 768,
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 3072,
|
16 |
+
"layer_norm_eps": 1e-07,
|
17 |
+
"max_position_embeddings": 1025,
|
18 |
+
"model_name": "camembertv2-base-bf16",
|
19 |
+
"model_type": "roberta",
|
20 |
+
"num_attention_heads": 12,
|
21 |
+
"num_hidden_layers": 12,
|
22 |
+
"pad_token_id": 0,
|
23 |
+
"position_biased_input": true,
|
24 |
+
"position_embedding_type": "absolute",
|
25 |
+
"torch_dtype": "float32",
|
26 |
+
"transformers_version": "4.44.2",
|
27 |
+
"type_vocab_size": 1,
|
28 |
+
"use_cache": true,
|
29 |
+
"vocab_size": 32768
|
30 |
+
}
|
model/lexers/camembertv2_base_p2_17k_last_layer/model/special_tokens_map.json
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "[CLS]",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"cls_token": {
|
10 |
+
"content": "[CLS]",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"eos_token": {
|
17 |
+
"content": "[SEP]",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
},
|
23 |
+
"mask_token": {
|
24 |
+
"content": "[MASK]",
|
25 |
+
"lstrip": false,
|
26 |
+
"normalized": false,
|
27 |
+
"rstrip": false,
|
28 |
+
"single_word": false
|
29 |
+
},
|
30 |
+
"pad_token": {
|
31 |
+
"content": "[PAD]",
|
32 |
+
"lstrip": false,
|
33 |
+
"normalized": false,
|
34 |
+
"rstrip": false,
|
35 |
+
"single_word": false
|
36 |
+
},
|
37 |
+
"sep_token": {
|
38 |
+
"content": "[SEP]",
|
39 |
+
"lstrip": false,
|
40 |
+
"normalized": false,
|
41 |
+
"rstrip": false,
|
42 |
+
"single_word": false
|
43 |
+
},
|
44 |
+
"unk_token": {
|
45 |
+
"content": "[UNK]",
|
46 |
+
"lstrip": false,
|
47 |
+
"normalized": false,
|
48 |
+
"rstrip": false,
|
49 |
+
"single_word": false
|
50 |
+
}
|
51 |
+
}
|
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model/lexers/camembertv2_base_p2_17k_last_layer/model/tokenizer_config.json
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_prefix_space": true,
|
3 |
+
"added_tokens_decoder": {
|
4 |
+
"0": {
|
5 |
+
"content": "[PAD]",
|
6 |
+
"lstrip": false,
|
7 |
+
"normalized": false,
|
8 |
+
"rstrip": false,
|
9 |
+
"single_word": false,
|
10 |
+
"special": true
|
11 |
+
},
|
12 |
+
"1": {
|
13 |
+
"content": "[CLS]",
|
14 |
+
"lstrip": false,
|
15 |
+
"normalized": false,
|
16 |
+
"rstrip": false,
|
17 |
+
"single_word": false,
|
18 |
+
"special": true
|
19 |
+
},
|
20 |
+
"2": {
|
21 |
+
"content": "[SEP]",
|
22 |
+
"lstrip": false,
|
23 |
+
"normalized": false,
|
24 |
+
"rstrip": false,
|
25 |
+
"single_word": false,
|
26 |
+
"special": true
|
27 |
+
},
|
28 |
+
"3": {
|
29 |
+
"content": "[UNK]",
|
30 |
+
"lstrip": false,
|
31 |
+
"normalized": false,
|
32 |
+
"rstrip": false,
|
33 |
+
"single_word": false,
|
34 |
+
"special": true
|
35 |
+
},
|
36 |
+
"4": {
|
37 |
+
"content": "[MASK]",
|
38 |
+
"lstrip": false,
|
39 |
+
"normalized": false,
|
40 |
+
"rstrip": false,
|
41 |
+
"single_word": false,
|
42 |
+
"special": true
|
43 |
+
}
|
44 |
+
},
|
45 |
+
"bos_token": "[CLS]",
|
46 |
+
"clean_up_tokenization_spaces": true,
|
47 |
+
"cls_token": "[CLS]",
|
48 |
+
"eos_token": "[SEP]",
|
49 |
+
"errors": "replace",
|
50 |
+
"mask_token": "[MASK]",
|
51 |
+
"model_max_length": 1000000000000000019884624838656,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"tokenizer_class": "RobertaTokenizer",
|
55 |
+
"trim_offsets": true,
|
56 |
+
"unk_token": "[UNK]"
|
57 |
+
}
|
model/lexers/char_level_embeddings/config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"char_embeddings_dim": 64, "output_dim": 128, "special_tokens": ["<root>"], "charset": ["<pad>", "<special>", " ", "!", "\"", "$", "%", "&", "'", "(", ")", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", "?", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "]", "^", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "\u00a9", "\u00b0", "\u00b1", "\u00bd", "\u00c0", "\u00c9", "\u00ce", "\u00df", "\u00e0", "\u00e1", "\u00e2", "\u00e4", "\u00e7", "\u00e8", "\u00e9", "\u00ea", "\u00eb", "\u00ee", "\u00ef", "\u00f3", "\u00f4", "\u00f6", "\u00f9", "\u00fb"]}
|
model/lexers/fasttext/config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"special_tokens": ["<root>"]}
|
model/lexers/fasttext/fasttext_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7b6536421c699578261983cca399f5da32aac7a6d5c5aac302ec6ded91e8f52
|
3 |
+
size 801050258
|
model/lexers/word_embeddings/config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model/weights.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c024e57f6020b583d6ea9018b7e63d9bb07fbc7e5dbf6a8b8a4181433a857ede
|
3 |
+
size 1749690986
|
train.log
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[hops] 2024-09-24 17:10:50.083 | INFO | Initializing a parser from /workspace/configs/exp_camembertv2/camembertv2_base_p2_17k_last_layer.yaml
|
2 |
+
[hops] 2024-09-24 17:10:50.136 | INFO | Generating a FastText model from the treebank
|
3 |
+
[hops] 2024-09-24 17:10:50.154 | INFO | Training fasttext model
|
4 |
+
[hops] 2024-09-24 17:10:51.461 | WARNING | Some weights of RobertaModel were not initialized from the model checkpoint at /scratch/camembertv2/runs/models/camembertv2-base-bf16/post/ckpt-p2-17000/pt/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
5 |
+
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
|
6 |
+
[hops] 2024-09-24 17:10:57.872 | INFO | Start training on cuda:3
|
7 |
+
[hops] 2024-09-24 17:10:57.876 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
8 |
+
[hops] 2024-09-24 17:11:12.533 | INFO | Epoch 0: train loss 2.8127 dev loss 2.0313 dev tag acc 38.38% dev head acc 26.47% dev deprel acc 52.06%
|
9 |
+
[hops] 2024-09-24 17:11:12.534 | INFO | New best model: head accuracy 26.47% > 0.00%
|
10 |
+
[hops] 2024-09-24 17:11:30.203 | INFO | Epoch 1: train loss 1.6494 dev loss 1.1043 dev tag acc 67.79% dev head acc 58.24% dev deprel acc 77.43%
|
11 |
+
[hops] 2024-09-24 17:11:30.203 | INFO | New best model: head accuracy 58.24% > 26.47%
|
12 |
+
[hops] 2024-09-24 17:11:47.950 | INFO | Epoch 2: train loss 1.0207 dev loss 0.6941 dev tag acc 79.40% dev head acc 77.31% dev deprel acc 84.67%
|
13 |
+
[hops] 2024-09-24 17:11:47.951 | INFO | New best model: head accuracy 77.31% > 58.24%
|
14 |
+
[hops] 2024-09-24 17:12:04.965 | INFO | Epoch 3: train loss 0.7013 dev loss 0.4977 dev tag acc 87.37% dev head acc 83.79% dev deprel acc 89.06%
|
15 |
+
[hops] 2024-09-24 17:12:04.966 | INFO | New best model: head accuracy 83.79% > 77.31%
|
16 |
+
[hops] 2024-09-24 17:12:21.325 | INFO | Epoch 4: train loss 0.5227 dev loss 0.3940 dev tag acc 90.34% dev head acc 85.27% dev deprel acc 91.90%
|
17 |
+
[hops] 2024-09-24 17:12:21.326 | INFO | New best model: head accuracy 85.27% > 83.79%
|
18 |
+
[hops] 2024-09-24 17:12:38.042 | INFO | Epoch 5: train loss 0.4107 dev loss 0.3263 dev tag acc 93.43% dev head acc 87.56% dev deprel acc 93.04%
|
19 |
+
[hops] 2024-09-24 17:12:38.043 | INFO | New best model: head accuracy 87.56% > 85.27%
|
20 |
+
[hops] 2024-09-24 17:12:55.903 | INFO | Epoch 6: train loss 0.3357 dev loss 0.2794 dev tag acc 95.46% dev head acc 88.97% dev deprel acc 93.72%
|
21 |
+
[hops] 2024-09-24 17:12:55.904 | INFO | New best model: head accuracy 88.97% > 87.56%
|
22 |
+
[hops] 2024-09-24 17:13:13.763 | INFO | Epoch 7: train loss 0.2768 dev loss 0.2410 dev tag acc 96.72% dev head acc 90.81% dev deprel acc 94.62%
|
23 |
+
[hops] 2024-09-24 17:13:13.764 | INFO | New best model: head accuracy 90.81% > 88.97%
|
24 |
+
[hops] 2024-09-24 17:13:31.249 | INFO | Epoch 8: train loss 0.2339 dev loss 0.2229 dev tag acc 97.78% dev head acc 91.26% dev deprel acc 95.34%
|
25 |
+
[hops] 2024-09-24 17:13:31.250 | INFO | New best model: head accuracy 91.26% > 90.81%
|
26 |
+
[hops] 2024-09-24 17:13:48.426 | INFO | Epoch 9: train loss 0.1999 dev loss 0.2153 dev tag acc 98.06% dev head acc 91.40% dev deprel acc 95.68%
|
27 |
+
[hops] 2024-09-24 17:13:48.427 | INFO | New best model: head accuracy 91.40% > 91.26%
|
28 |
+
[hops] 2024-09-24 17:14:05.318 | INFO | Epoch 10: train loss 0.1755 dev loss 0.2009 dev tag acc 98.31% dev head acc 92.71% dev deprel acc 96.05%
|
29 |
+
[hops] 2024-09-24 17:14:05.320 | INFO | New best model: head accuracy 92.71% > 91.40%
|
30 |
+
[hops] 2024-09-24 17:14:22.152 | INFO | Epoch 11: train loss 0.1523 dev loss 0.1893 dev tag acc 98.35% dev head acc 93.49% dev deprel acc 96.03%
|
31 |
+
[hops] 2024-09-24 17:14:22.153 | INFO | New best model: head accuracy 93.49% > 92.71%
|
32 |
+
[hops] 2024-09-24 17:14:39.729 | INFO | Epoch 12: train loss 0.1368 dev loss 0.1846 dev tag acc 98.52% dev head acc 93.65% dev deprel acc 96.35%
|
33 |
+
[hops] 2024-09-24 17:14:39.730 | INFO | New best model: head accuracy 93.65% > 93.49%
|
34 |
+
[hops] 2024-09-24 17:14:57.785 | INFO | Epoch 13: train loss 0.1220 dev loss 0.2027 dev tag acc 98.63% dev head acc 93.91% dev deprel acc 96.43%
|
35 |
+
[hops] 2024-09-24 17:14:57.786 | INFO | New best model: head accuracy 93.91% > 93.65%
|
36 |
+
[hops] 2024-09-24 17:15:15.740 | INFO | Epoch 14: train loss 0.1122 dev loss 0.1918 dev tag acc 98.79% dev head acc 94.04% dev deprel acc 96.89%
|
37 |
+
[hops] 2024-09-24 17:15:15.741 | INFO | New best model: head accuracy 94.04% > 93.91%
|
38 |
+
[hops] 2024-09-24 17:15:33.756 | INFO | Epoch 15: train loss 0.1010 dev loss 0.1851 dev tag acc 98.90% dev head acc 93.87% dev deprel acc 97.01%
|
39 |
+
[hops] 2024-09-24 17:15:49.185 | INFO | Epoch 16: train loss 0.0943 dev loss 0.1964 dev tag acc 98.97% dev head acc 94.26% dev deprel acc 97.05%
|
40 |
+
[hops] 2024-09-24 17:15:49.186 | INFO | New best model: head accuracy 94.26% > 94.04%
|
41 |
+
[hops] 2024-09-24 17:16:05.715 | INFO | Epoch 17: train loss 0.0822 dev loss 0.1925 dev tag acc 98.84% dev head acc 95.16% dev deprel acc 97.18%
|
42 |
+
[hops] 2024-09-24 17:16:05.716 | INFO | New best model: head accuracy 95.16% > 94.26%
|
43 |
+
[hops] 2024-09-24 17:16:23.327 | INFO | Epoch 18: train loss 0.0783 dev loss 0.1929 dev tag acc 99.02% dev head acc 94.98% dev deprel acc 97.26%
|
44 |
+
[hops] 2024-09-24 17:16:38.640 | INFO | Epoch 19: train loss 0.0720 dev loss 0.1976 dev tag acc 99.09% dev head acc 95.05% dev deprel acc 97.11%
|
45 |
+
[hops] 2024-09-24 17:16:53.616 | INFO | Epoch 20: train loss 0.0644 dev loss 0.1988 dev tag acc 99.10% dev head acc 95.09% dev deprel acc 97.28%
|
46 |
+
[hops] 2024-09-24 17:17:09.018 | INFO | Epoch 21: train loss 0.0609 dev loss 0.2084 dev tag acc 99.14% dev head acc 95.39% dev deprel acc 97.28%
|
47 |
+
[hops] 2024-09-24 17:17:09.019 | INFO | New best model: head accuracy 95.39% > 95.16%
|
48 |
+
[hops] 2024-09-24 17:17:26.308 | INFO | Epoch 22: train loss 0.0585 dev loss 0.2076 dev tag acc 99.19% dev head acc 95.35% dev deprel acc 97.58%
|
49 |
+
[hops] 2024-09-24 17:17:41.545 | INFO | Epoch 23: train loss 0.0545 dev loss 0.2094 dev tag acc 99.15% dev head acc 95.29% dev deprel acc 97.49%
|
50 |
+
[hops] 2024-09-24 17:17:56.873 | INFO | Epoch 24: train loss 0.0502 dev loss 0.2116 dev tag acc 99.17% dev head acc 95.23% dev deprel acc 97.49%
|
51 |
+
[hops] 2024-09-24 17:18:12.142 | INFO | Epoch 25: train loss 0.0455 dev loss 0.2059 dev tag acc 99.23% dev head acc 95.44% dev deprel acc 97.55%
|
52 |
+
[hops] 2024-09-24 17:18:12.143 | INFO | New best model: head accuracy 95.44% > 95.39%
|
53 |
+
[hops] 2024-09-24 17:18:29.698 | INFO | Epoch 26: train loss 0.0436 dev loss 0.2258 dev tag acc 99.22% dev head acc 95.46% dev deprel acc 97.44%
|
54 |
+
[hops] 2024-09-24 17:18:29.699 | INFO | New best model: head accuracy 95.46% > 95.44%
|
55 |
+
[hops] 2024-09-24 17:18:45.933 | INFO | Epoch 27: train loss 0.0404 dev loss 0.2359 dev tag acc 99.20% dev head acc 95.56% dev deprel acc 97.47%
|
56 |
+
[hops] 2024-09-24 17:18:45.934 | INFO | New best model: head accuracy 95.56% > 95.46%
|
57 |
+
[hops] 2024-09-24 17:19:02.822 | INFO | Epoch 28: train loss 0.0376 dev loss 0.2342 dev tag acc 99.21% dev head acc 95.75% dev deprel acc 97.56%
|
58 |
+
[hops] 2024-09-24 17:19:02.823 | INFO | New best model: head accuracy 95.75% > 95.56%
|
59 |
+
[hops] 2024-09-24 17:19:20.495 | INFO | Epoch 29: train loss 0.0365 dev loss 0.2271 dev tag acc 99.21% dev head acc 95.69% dev deprel acc 97.61%
|
60 |
+
[hops] 2024-09-24 17:19:34.954 | INFO | Epoch 30: train loss 0.0349 dev loss 0.2359 dev tag acc 99.21% dev head acc 95.79% dev deprel acc 97.60%
|
61 |
+
[hops] 2024-09-24 17:19:34.955 | INFO | New best model: head accuracy 95.79% > 95.75%
|
62 |
+
[hops] 2024-09-24 17:19:52.083 | INFO | Epoch 31: train loss 0.0333 dev loss 0.2284 dev tag acc 99.22% dev head acc 95.68% dev deprel acc 97.60%
|
63 |
+
[hops] 2024-09-24 17:20:07.114 | INFO | Epoch 32: train loss 0.0302 dev loss 0.2329 dev tag acc 99.23% dev head acc 95.50% dev deprel acc 97.64%
|
64 |
+
[hops] 2024-09-24 17:20:20.909 | INFO | Epoch 33: train loss 0.0280 dev loss 0.2253 dev tag acc 99.27% dev head acc 95.56% dev deprel acc 97.70%
|
65 |
+
[hops] 2024-09-24 17:20:36.250 | INFO | Epoch 34: train loss 0.0269 dev loss 0.2490 dev tag acc 99.20% dev head acc 95.74% dev deprel acc 97.69%
|
66 |
+
[hops] 2024-09-24 17:20:51.100 | INFO | Epoch 35: train loss 0.0266 dev loss 0.2576 dev tag acc 99.21% dev head acc 95.74% dev deprel acc 97.78%
|
67 |
+
[hops] 2024-09-24 17:21:04.672 | INFO | Epoch 36: train loss 0.0255 dev loss 0.2525 dev tag acc 99.31% dev head acc 95.81% dev deprel acc 97.75%
|
68 |
+
[hops] 2024-09-24 17:21:04.673 | INFO | New best model: head accuracy 95.81% > 95.79%
|
69 |
+
[hops] 2024-09-24 17:21:20.986 | INFO | Epoch 37: train loss 0.0226 dev loss 0.2545 dev tag acc 99.29% dev head acc 95.85% dev deprel acc 97.67%
|
70 |
+
[hops] 2024-09-24 17:21:20.987 | INFO | New best model: head accuracy 95.85% > 95.81%
|
71 |
+
[hops] 2024-09-24 17:21:38.097 | INFO | Epoch 38: train loss 0.0224 dev loss 0.2743 dev tag acc 99.26% dev head acc 95.97% dev deprel acc 97.61%
|
72 |
+
[hops] 2024-09-24 17:21:38.098 | INFO | New best model: head accuracy 95.97% > 95.85%
|
73 |
+
[hops] 2024-09-24 17:21:54.248 | INFO | Epoch 39: train loss 0.0213 dev loss 0.2854 dev tag acc 99.26% dev head acc 95.75% dev deprel acc 97.66%
|
74 |
+
[hops] 2024-09-24 17:22:09.077 | INFO | Epoch 40: train loss 0.0212 dev loss 0.2520 dev tag acc 99.26% dev head acc 95.94% dev deprel acc 97.63%
|
75 |
+
[hops] 2024-09-24 17:22:24.533 | INFO | Epoch 41: train loss 0.0198 dev loss 0.2570 dev tag acc 99.31% dev head acc 96.04% dev deprel acc 97.81%
|
76 |
+
[hops] 2024-09-24 17:22:24.534 | INFO | New best model: head accuracy 96.04% > 95.97%
|
77 |
+
[hops] 2024-09-24 17:22:41.309 | INFO | Epoch 42: train loss 0.0179 dev loss 0.2711 dev tag acc 99.30% dev head acc 95.95% dev deprel acc 97.74%
|
78 |
+
[hops] 2024-09-24 17:22:56.619 | INFO | Epoch 43: train loss 0.0166 dev loss 0.2740 dev tag acc 99.27% dev head acc 96.03% dev deprel acc 97.86%
|
79 |
+
[hops] 2024-09-24 17:23:11.247 | INFO | Epoch 44: train loss 0.0168 dev loss 0.2802 dev tag acc 99.27% dev head acc 96.07% dev deprel acc 97.83%
|
80 |
+
[hops] 2024-09-24 17:23:11.248 | INFO | New best model: head accuracy 96.07% > 96.04%
|
81 |
+
[hops] 2024-09-24 17:23:29.041 | INFO | Epoch 45: train loss 0.0163 dev loss 0.2719 dev tag acc 99.28% dev head acc 96.19% dev deprel acc 97.87%
|
82 |
+
[hops] 2024-09-24 17:23:29.042 | INFO | New best model: head accuracy 96.19% > 96.07%
|
83 |
+
[hops] 2024-09-24 17:23:46.148 | INFO | Epoch 46: train loss 0.0180 dev loss 0.2666 dev tag acc 99.26% dev head acc 96.01% dev deprel acc 97.86%
|
84 |
+
[hops] 2024-09-24 17:24:01.336 | INFO | Epoch 47: train loss 0.0142 dev loss 0.2792 dev tag acc 99.29% dev head acc 96.07% dev deprel acc 97.83%
|
85 |
+
[hops] 2024-09-24 17:24:16.066 | INFO | Epoch 48: train loss 0.0134 dev loss 0.2820 dev tag acc 99.27% dev head acc 96.06% dev deprel acc 97.79%
|
86 |
+
[hops] 2024-09-24 17:24:31.201 | INFO | Epoch 49: train loss 0.0137 dev loss 0.2877 dev tag acc 99.32% dev head acc 96.13% dev deprel acc 97.85%
|
87 |
+
[hops] 2024-09-24 17:24:46.077 | INFO | Epoch 50: train loss 0.0130 dev loss 0.2910 dev tag acc 99.28% dev head acc 96.11% dev deprel acc 97.91%
|
88 |
+
[hops] 2024-09-24 17:25:01.474 | INFO | Epoch 51: train loss 0.0120 dev loss 0.3076 dev tag acc 99.27% dev head acc 96.06% dev deprel acc 97.86%
|
89 |
+
[hops] 2024-09-24 17:25:15.876 | INFO | Epoch 52: train loss 0.0114 dev loss 0.3043 dev tag acc 99.28% dev head acc 96.13% dev deprel acc 97.86%
|
90 |
+
[hops] 2024-09-24 17:25:31.219 | INFO | Epoch 53: train loss 0.0113 dev loss 0.3071 dev tag acc 99.26% dev head acc 96.07% dev deprel acc 97.89%
|
91 |
+
[hops] 2024-09-24 17:25:46.377 | INFO | Epoch 54: train loss 0.0103 dev loss 0.3065 dev tag acc 99.27% dev head acc 96.25% dev deprel acc 97.94%
|
92 |
+
[hops] 2024-09-24 17:25:46.378 | INFO | New best model: head accuracy 96.25% > 96.19%
|
93 |
+
[hops] 2024-09-24 17:26:03.659 | INFO | Epoch 55: train loss 0.0104 dev loss 0.3091 dev tag acc 99.27% dev head acc 96.19% dev deprel acc 97.88%
|
94 |
+
[hops] 2024-09-24 17:26:18.791 | INFO | Epoch 56: train loss 0.0098 dev loss 0.3122 dev tag acc 99.27% dev head acc 96.05% dev deprel acc 97.84%
|
95 |
+
[hops] 2024-09-24 17:26:34.137 | INFO | Epoch 57: train loss 0.0094 dev loss 0.3159 dev tag acc 99.26% dev head acc 96.07% dev deprel acc 97.82%
|
96 |
+
[hops] 2024-09-24 17:26:49.249 | INFO | Epoch 58: train loss 0.0094 dev loss 0.3203 dev tag acc 99.28% dev head acc 96.15% dev deprel acc 97.87%
|
97 |
+
[hops] 2024-09-24 17:27:04.725 | INFO | Epoch 59: train loss 0.0082 dev loss 0.3228 dev tag acc 99.28% dev head acc 96.17% dev deprel acc 97.92%
|
98 |
+
[hops] 2024-09-24 17:27:20.299 | INFO | Epoch 60: train loss 0.0089 dev loss 0.3213 dev tag acc 99.29% dev head acc 96.13% dev deprel acc 97.92%
|
99 |
+
[hops] 2024-09-24 17:27:35.759 | INFO | Epoch 61: train loss 0.0082 dev loss 0.3217 dev tag acc 99.29% dev head acc 96.19% dev deprel acc 97.93%
|
100 |
+
[hops] 2024-09-24 17:27:51.235 | INFO | Epoch 62: train loss 0.0082 dev loss 0.3231 dev tag acc 99.29% dev head acc 96.24% dev deprel acc 97.91%
|
101 |
+
[hops] 2024-09-24 17:28:06.104 | INFO | Epoch 63: train loss 0.0085 dev loss 0.3223 dev tag acc 99.29% dev head acc 96.24% dev deprel acc 97.90%
|
102 |
+
[hops] 2024-09-24 17:28:11.404 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
103 |
+
[hops] 2024-09-24 17:28:16.852 | WARNING | You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
104 |
+
[hops] 2024-09-24 17:28:18.856 | INFO | Metrics for Sequoia-camembertv2_base_p2_17k_last_layer+rand_seed=42
|
105 |
+
───────────────────────────────
|
106 |
+
Split UPOS UAS LAS
|
107 |
+
───────────────────────────────
|
108 |
+
Dev 99.27 96.30 95.11
|
109 |
+
Test 99.38 96.12 94.94
|
110 |
+
───────────────────────────────
|
111 |
+
|