Commit
·
3710092
1
Parent(s):
2df0e90
Update README.md
Browse files
README.md
CHANGED
@@ -7,10 +7,10 @@ metrics:
|
|
7 |
- f1
|
8 |
- accuracy
|
9 |
model-index:
|
10 |
-
- name: Camembert-
|
11 |
results: []
|
12 |
datasets:
|
13 |
-
- CATIE-AQ/
|
14 |
language:
|
15 |
- fr
|
16 |
widget:
|
@@ -25,8 +25,8 @@ co2_eq_emissions: 35
|
|
25 |
|
26 |
## Model Description
|
27 |
|
28 |
-
We present **Camembert-
|
29 |
-
All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/
|
30 |
This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
|
31 |
Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
|
32 |
|
@@ -34,7 +34,7 @@ Our methodology is described in a blog post available in [English](https://blog.
|
|
34 |
|
35 |
## Dataset
|
36 |
|
37 |
-
The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/
|
38 |
* PER: personality ;
|
39 |
* LOC: location ;
|
40 |
* ORG: organization ;
|
@@ -81,6 +81,61 @@ The distribution of the entities is as follows:
|
|
81 |
|
82 |
The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
|
83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
### multiconer
|
85 |
|
86 |
<table>
|
@@ -91,7 +146,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
91 |
<th><br>PER</th>
|
92 |
<th><br>LOC</th>
|
93 |
<th><br>ORG</th>
|
94 |
-
<th><br>
|
95 |
<th><br>Overall</th>
|
96 |
</tr>
|
97 |
</thead>
|
@@ -143,7 +198,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
143 |
<th><br>PER</th>
|
144 |
<th><br>LOC</th>
|
145 |
<th><br>ORG</th>
|
146 |
-
<th><br>
|
147 |
<th><br>Overall</th>
|
148 |
</tr>
|
149 |
</thead>
|
@@ -195,7 +250,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
195 |
<th><br>PER</th>
|
196 |
<th><br>LOC</th>
|
197 |
<th><br>ORG</th>
|
198 |
-
<th><br>
|
199 |
<th><br>Overall</th>
|
200 |
</tr>
|
201 |
</thead>
|
@@ -247,7 +302,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
247 |
<th><br>PER</th>
|
248 |
<th><br>LOC</th>
|
249 |
<th><br>ORG</th>
|
250 |
-
<th><br>
|
251 |
<th><br>Overall</th>
|
252 |
</tr>
|
253 |
</thead>
|
@@ -289,59 +344,6 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
289 |
</tbody>
|
290 |
</table>
|
291 |
|
292 |
-
### frenchNER
|
293 |
-
|
294 |
-
<table>
|
295 |
-
<thead>
|
296 |
-
<tr>
|
297 |
-
<th><br>Model</th>
|
298 |
-
<th><br>Metrics</th>
|
299 |
-
<th><br>PER</th>
|
300 |
-
<th><br>LOC</th>
|
301 |
-
<th><br>ORG</th>
|
302 |
-
<th><br>Other</th>
|
303 |
-
<th><br>Overall</th>
|
304 |
-
</tr>
|
305 |
-
</thead>
|
306 |
-
<tbody>
|
307 |
-
<tr>
|
308 |
-
<td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
|
309 |
-
<td><br>Precision</td>
|
310 |
-
<td><br>0,961</td>
|
311 |
-
<td><br>0,935</td>
|
312 |
-
<td><br>0,877</td>
|
313 |
-
<td><br>0,995</td>
|
314 |
-
<td><br>0,986</td>
|
315 |
-
</tr>
|
316 |
-
<tr>
|
317 |
-
<td><br>Recall</td>
|
318 |
-
<td><br>0,972</td>
|
319 |
-
<td><br>0,946</td>
|
320 |
-
<td><br>0,876</td>
|
321 |
-
<td><br>0,994</td>
|
322 |
-
<td><br>0,986</td>
|
323 |
-
</tr>
|
324 |
-
<tr>
|
325 |
-
<td>F1</td>
|
326 |
-
<td><br>0,966</td>
|
327 |
-
<td><br>0,940</td>
|
328 |
-
<td><br>0,876</td>
|
329 |
-
<td><br>0,994</td>
|
330 |
-
<td><br>0,986</td>
|
331 |
-
</tr>
|
332 |
-
<tr>
|
333 |
-
<td></td>
|
334 |
-
<td><br>Number</td>
|
335 |
-
<td><br>88,139</td>
|
336 |
-
<td><br>78,278</td>
|
337 |
-
<td><br>35,788</td>
|
338 |
-
<td><br>1,040,925</td>
|
339 |
-
<td><br>1,243,130</td>
|
340 |
-
</tr>
|
341 |
-
</tbody>
|
342 |
-
</table>
|
343 |
-
|
344 |
-
|
345 |
|
346 |
## Usage
|
347 |
### Code
|
@@ -349,7 +351,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
|
|
349 |
```python
|
350 |
from transformers import pipeline
|
351 |
|
352 |
-
ner = pipeline('question-answering', model='CATIE-AQ/Camembert-
|
353 |
|
354 |
result = ner(
|
355 |
"Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
|
@@ -470,7 +472,7 @@ The following hyperparameters were used during training:
|
|
470 |
|
471 |
## Citations
|
472 |
|
473 |
-
### Camembert-
|
474 |
```
|
475 |
TODO
|
476 |
```
|
@@ -543,7 +545,7 @@ url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
|
|
543 |
author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
|
544 |
|
545 |
|
546 |
-
###
|
547 |
```
|
548 |
TODO
|
549 |
```
|
|
|
7 |
- f1
|
8 |
- accuracy
|
9 |
model-index:
|
10 |
+
- name: Camembert-base-frenchNER_3entities
|
11 |
results: []
|
12 |
datasets:
|
13 |
+
- CATIE-AQ/frenchNER_3entities
|
14 |
language:
|
15 |
- fr
|
16 |
widget:
|
|
|
25 |
|
26 |
## Model Description
|
27 |
|
28 |
+
We present **Camembert-base-frenchNER_3entities**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
|
29 |
+
All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities).
|
30 |
This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
|
31 |
Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
|
32 |
|
|
|
34 |
|
35 |
## Dataset
|
36 |
|
37 |
+
The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities), which represents ~420k sentences labeled in 4 categories :
|
38 |
* PER: personality ;
|
39 |
* LOC: location ;
|
40 |
* ORG: organization ;
|
|
|
81 |
|
82 |
The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
|
83 |
|
84 |
+
### frenchNER_3entities
|
85 |
+
|
86 |
+
<table>
|
87 |
+
<thead>
|
88 |
+
<tr>
|
89 |
+
<th><br>Model</th>
|
90 |
+
<th><br>Metrics</th>
|
91 |
+
<th><br>PER</th>
|
92 |
+
<th><br>LOC</th>
|
93 |
+
<th><br>ORG</th>
|
94 |
+
<th><br>O</th>
|
95 |
+
<th><br>Overall</th>
|
96 |
+
</tr>
|
97 |
+
</thead>
|
98 |
+
<tbody>
|
99 |
+
<tr>
|
100 |
+
<td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
|
101 |
+
<td><br>Precision</td>
|
102 |
+
<td><br>0,961</td>
|
103 |
+
<td><br>0,935</td>
|
104 |
+
<td><br>0,877</td>
|
105 |
+
<td><br>0,995</td>
|
106 |
+
<td><br>0,986</td>
|
107 |
+
</tr>
|
108 |
+
<tr>
|
109 |
+
<td><br>Recall</td>
|
110 |
+
<td><br>0,972</td>
|
111 |
+
<td><br>0,946</td>
|
112 |
+
<td><br>0,876</td>
|
113 |
+
<td><br>0,994</td>
|
114 |
+
<td><br>0,986</td>
|
115 |
+
</tr>
|
116 |
+
<tr>
|
117 |
+
<td>F1</td>
|
118 |
+
<td><br>0,966</td>
|
119 |
+
<td><br>0,940</td>
|
120 |
+
<td><br>0,876</td>
|
121 |
+
<td><br>0,994</td>
|
122 |
+
<td><br>0,986</td>
|
123 |
+
</tr>
|
124 |
+
<tr>
|
125 |
+
<td></td>
|
126 |
+
<td><br>Number</td>
|
127 |
+
<td><br>88,139</td>
|
128 |
+
<td><br>78,278</td>
|
129 |
+
<td><br>35,788</td>
|
130 |
+
<td><br>1,040,925</td>
|
131 |
+
<td><br>1,243,130</td>
|
132 |
+
</tr>
|
133 |
+
</tbody>
|
134 |
+
</table>
|
135 |
+
|
136 |
+
|
137 |
+
In detail:
|
138 |
+
|
139 |
### multiconer
|
140 |
|
141 |
<table>
|
|
|
146 |
<th><br>PER</th>
|
147 |
<th><br>LOC</th>
|
148 |
<th><br>ORG</th>
|
149 |
+
<th><br>O</th>
|
150 |
<th><br>Overall</th>
|
151 |
</tr>
|
152 |
</thead>
|
|
|
198 |
<th><br>PER</th>
|
199 |
<th><br>LOC</th>
|
200 |
<th><br>ORG</th>
|
201 |
+
<th><br>O</th>
|
202 |
<th><br>Overall</th>
|
203 |
</tr>
|
204 |
</thead>
|
|
|
250 |
<th><br>PER</th>
|
251 |
<th><br>LOC</th>
|
252 |
<th><br>ORG</th>
|
253 |
+
<th><br>O</th>
|
254 |
<th><br>Overall</th>
|
255 |
</tr>
|
256 |
</thead>
|
|
|
302 |
<th><br>PER</th>
|
303 |
<th><br>LOC</th>
|
304 |
<th><br>ORG</th>
|
305 |
+
<th><br>O</th>
|
306 |
<th><br>Overall</th>
|
307 |
</tr>
|
308 |
</thead>
|
|
|
344 |
</tbody>
|
345 |
</table>
|
346 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
347 |
|
348 |
## Usage
|
349 |
### Code
|
|
|
351 |
```python
|
352 |
from transformers import pipeline
|
353 |
|
354 |
+
ner = pipeline('question-answering', model='CATIE-AQ/Camembert-base-frenchNER_3entities', tokenizer='CATIE-AQ/Camembert-base-frenchNER_3entities', grouped_entities=True)
|
355 |
|
356 |
result = ner(
|
357 |
"Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
|
|
|
472 |
|
473 |
## Citations
|
474 |
|
475 |
+
### Camembert-frenchNER_3entities
|
476 |
```
|
477 |
TODO
|
478 |
```
|
|
|
545 |
author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
|
546 |
|
547 |
|
548 |
+
### frenchNER_3entities
|
549 |
```
|
550 |
TODO
|
551 |
```
|