Token Classification
Transformers
TensorBoard
Safetensors
French
camembert
bourdoiscatie commited on
Commit
3710092
·
1 Parent(s): 2df0e90

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -65
README.md CHANGED
@@ -7,10 +7,10 @@ metrics:
7
  - f1
8
  - accuracy
9
  model-index:
10
- - name: Camembert-NER-base-frenchNER
11
  results: []
12
  datasets:
13
- - CATIE-AQ/frenchNER
14
  language:
15
  - fr
16
  widget:
@@ -25,8 +25,8 @@ co2_eq_emissions: 35
25
 
26
  ## Model Description
27
 
28
- We present **Camembert-NER-base-frenchNER**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
29
- All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER).
30
  This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
31
  Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
32
 
@@ -34,7 +34,7 @@ Our methodology is described in a blog post available in [English](https://blog.
34
 
35
  ## Dataset
36
 
37
- The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER), which represents ~420k sentences labeled in 4 categories :
38
  * PER: personality ;
39
  * LOC: location ;
40
  * ORG: organization ;
@@ -81,6 +81,61 @@ The distribution of the entities is as follows:
81
 
82
  The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
83
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ### multiconer
85
 
86
  <table>
@@ -91,7 +146,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
91
  <th><br>PER</th>
92
  <th><br>LOC</th>
93
  <th><br>ORG</th>
94
- <th><br>Other</th>
95
  <th><br>Overall</th>
96
  </tr>
97
  </thead>
@@ -143,7 +198,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
143
  <th><br>PER</th>
144
  <th><br>LOC</th>
145
  <th><br>ORG</th>
146
- <th><br>Other</th>
147
  <th><br>Overall</th>
148
  </tr>
149
  </thead>
@@ -195,7 +250,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
195
  <th><br>PER</th>
196
  <th><br>LOC</th>
197
  <th><br>ORG</th>
198
- <th><br>Other</th>
199
  <th><br>Overall</th>
200
  </tr>
201
  </thead>
@@ -247,7 +302,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
247
  <th><br>PER</th>
248
  <th><br>LOC</th>
249
  <th><br>ORG</th>
250
- <th><br>Other</th>
251
  <th><br>Overall</th>
252
  </tr>
253
  </thead>
@@ -289,59 +344,6 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
289
  </tbody>
290
  </table>
291
 
292
- ### frenchNER
293
-
294
- <table>
295
- <thead>
296
- <tr>
297
- <th><br>Model</th>
298
- <th><br>Metrics</th>
299
- <th><br>PER</th>
300
- <th><br>LOC</th>
301
- <th><br>ORG</th>
302
- <th><br>Other</th>
303
- <th><br>Overall</th>
304
- </tr>
305
- </thead>
306
- <tbody>
307
- <tr>
308
- <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
309
- <td><br>Precision</td>
310
- <td><br>0,961</td>
311
- <td><br>0,935</td>
312
- <td><br>0,877</td>
313
- <td><br>0,995</td>
314
- <td><br>0,986</td>
315
- </tr>
316
- <tr>
317
- <td><br>Recall</td>
318
- <td><br>0,972</td>
319
- <td><br>0,946</td>
320
- <td><br>0,876</td>
321
- <td><br>0,994</td>
322
- <td><br>0,986</td>
323
- </tr>
324
- <tr>
325
- <td>F1</td>
326
- <td><br>0,966</td>
327
- <td><br>0,940</td>
328
- <td><br>0,876</td>
329
- <td><br>0,994</td>
330
- <td><br>0,986</td>
331
- </tr>
332
- <tr>
333
- <td></td>
334
- <td><br>Number</td>
335
- <td><br>88,139</td>
336
- <td><br>78,278</td>
337
- <td><br>35,788</td>
338
- <td><br>1,040,925</td>
339
- <td><br>1,243,130</td>
340
- </tr>
341
- </tbody>
342
- </table>
343
-
344
-
345
 
346
  ## Usage
347
  ### Code
@@ -349,7 +351,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
349
  ```python
350
  from transformers import pipeline
351
 
352
- ner = pipeline('question-answering', model='CATIE-AQ/Camembert-NER-base-frenchNER', tokenizer='CATIE-AQ/Camembert-NER-base-frenchNER', grouped_entities=True)
353
 
354
  result = ner(
355
  "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
@@ -470,7 +472,7 @@ The following hyperparameters were used during training:
470
 
471
  ## Citations
472
 
473
- ### Camembert-NER-frenchNER
474
  ```
475
  TODO
476
  ```
@@ -543,7 +545,7 @@ url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
543
  author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
544
 
545
 
546
- ### frenchNER
547
  ```
548
  TODO
549
  ```
 
7
  - f1
8
  - accuracy
9
  model-index:
10
+ - name: Camembert-base-frenchNER_3entities
11
  results: []
12
  datasets:
13
+ - CATIE-AQ/frenchNER_3entities
14
  language:
15
  - fr
16
  widget:
 
25
 
26
  ## Model Description
27
 
28
+ We present **Camembert-base-frenchNER_3entities**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
29
+ All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities).
30
  This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
31
  Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
32
 
 
34
 
35
  ## Dataset
36
 
37
+ The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities), which represents ~420k sentences labeled in 4 categories :
38
  * PER: personality ;
39
  * LOC: location ;
40
  * ORG: organization ;
 
81
 
82
  The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
83
 
84
+ ### frenchNER_3entities
85
+
86
+ <table>
87
+ <thead>
88
+ <tr>
89
+ <th><br>Model</th>
90
+ <th><br>Metrics</th>
91
+ <th><br>PER</th>
92
+ <th><br>LOC</th>
93
+ <th><br>ORG</th>
94
+ <th><br>O</th>
95
+ <th><br>Overall</th>
96
+ </tr>
97
+ </thead>
98
+ <tbody>
99
+ <tr>
100
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
101
+ <td><br>Precision</td>
102
+ <td><br>0,961</td>
103
+ <td><br>0,935</td>
104
+ <td><br>0,877</td>
105
+ <td><br>0,995</td>
106
+ <td><br>0,986</td>
107
+ </tr>
108
+ <tr>
109
+ <td><br>Recall</td>
110
+ <td><br>0,972</td>
111
+ <td><br>0,946</td>
112
+ <td><br>0,876</td>
113
+ <td><br>0,994</td>
114
+ <td><br>0,986</td>
115
+ </tr>
116
+ <tr>
117
+ <td>F1</td>
118
+ <td><br>0,966</td>
119
+ <td><br>0,940</td>
120
+ <td><br>0,876</td>
121
+ <td><br>0,994</td>
122
+ <td><br>0,986</td>
123
+ </tr>
124
+ <tr>
125
+ <td></td>
126
+ <td><br>Number</td>
127
+ <td><br>88,139</td>
128
+ <td><br>78,278</td>
129
+ <td><br>35,788</td>
130
+ <td><br>1,040,925</td>
131
+ <td><br>1,243,130</td>
132
+ </tr>
133
+ </tbody>
134
+ </table>
135
+
136
+
137
+ In detail:
138
+
139
  ### multiconer
140
 
141
  <table>
 
146
  <th><br>PER</th>
147
  <th><br>LOC</th>
148
  <th><br>ORG</th>
149
+ <th><br>O</th>
150
  <th><br>Overall</th>
151
  </tr>
152
  </thead>
 
198
  <th><br>PER</th>
199
  <th><br>LOC</th>
200
  <th><br>ORG</th>
201
+ <th><br>O</th>
202
  <th><br>Overall</th>
203
  </tr>
204
  </thead>
 
250
  <th><br>PER</th>
251
  <th><br>LOC</th>
252
  <th><br>ORG</th>
253
+ <th><br>O</th>
254
  <th><br>Overall</th>
255
  </tr>
256
  </thead>
 
302
  <th><br>PER</th>
303
  <th><br>LOC</th>
304
  <th><br>ORG</th>
305
+ <th><br>O</th>
306
  <th><br>Overall</th>
307
  </tr>
308
  </thead>
 
344
  </tbody>
345
  </table>
346
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
 
348
  ## Usage
349
  ### Code
 
351
  ```python
352
  from transformers import pipeline
353
 
354
+ ner = pipeline('question-answering', model='CATIE-AQ/Camembert-base-frenchNER_3entities', tokenizer='CATIE-AQ/Camembert-base-frenchNER_3entities', grouped_entities=True)
355
 
356
  result = ner(
357
  "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
 
472
 
473
  ## Citations
474
 
475
+ ### Camembert-frenchNER_3entities
476
  ```
477
  TODO
478
  ```
 
545
  author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
546
 
547
 
548
+ ### frenchNER_3entities
549
  ```
550
  TODO
551
  ```