Token Classification
Transformers
TensorBoard
Safetensors
French
camembert
Inference Endpoints
bourdoiscatie commited on
Commit
2df0e90
·
1 Parent(s): c221b83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +253 -6
README.md CHANGED
@@ -27,7 +27,7 @@ co2_eq_emissions: 35
27
 
28
  We present **Camembert-NER-base-frenchNER**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
29
  All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER).
30
- This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**.
31
  Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
32
 
33
 
@@ -82,19 +82,266 @@ The distribution of the entities is as follows:
82
  The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
83
 
84
  ### multiconer
85
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
  ### multinerd
88
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
  ### wikiann
91
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
  ### wikiner
94
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  ### frenchNER
97
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
 
99
  ## Usage
100
  ### Code
 
27
 
28
  We present **Camembert-NER-base-frenchNER**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
29
  All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER).
30
+ This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
31
  Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
32
 
33
 
 
82
  The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
83
 
84
  ### multiconer
85
+
86
+ <table>
87
+ <thead>
88
+ <tr>
89
+ <th><br>Model</th>
90
+ <th><br>Metrics</th>
91
+ <th><br>PER</th>
92
+ <th><br>LOC</th>
93
+ <th><br>ORG</th>
94
+ <th><br>Other</th>
95
+ <th><br>Overall</th>
96
+ </tr>
97
+ </thead>
98
+ <tbody>
99
+ <tr>
100
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
101
+ <td><br>Precision</td>
102
+ <td><br>0,957</td>
103
+ <td><br>0,894</td>
104
+ <td><br>0,876</td>
105
+ <td><br>0,986</td>
106
+ <td><br>0,972</td>
107
+ </tr>
108
+ <tr>
109
+ <td><br>Recall</td>
110
+ <td><br>0,962</td>
111
+ <td><br>0,880</td>
112
+ <td><br>0,878</td>
113
+ <td><br>0,985</td>
114
+ <td><br>0,972</td>
115
+ </tr>
116
+ <tr>
117
+ <td>F1</td>
118
+ <td><br>0,960</td>
119
+ <td><br>0,887</td>
120
+ <td><br>0,876</td>
121
+ <td><br>0,985</td>
122
+ <td><br>0,972</td>
123
+ </tr>
124
+ <tr>
125
+ <td></td>
126
+ <td><br>Number</td>
127
+ <td><br>2,526</td>
128
+ <td><br>884</td>
129
+ <td><br>830</td>
130
+ <td><br>13,710</td>
131
+ <td><br>17,950</td>
132
+ </tr>
133
+ </tbody>
134
+ </table>
135
 
136
  ### multinerd
137
+
138
+ <table>
139
+ <thead>
140
+ <tr>
141
+ <th><br>Model</th>
142
+ <th><br>Metrics</th>
143
+ <th><br>PER</th>
144
+ <th><br>LOC</th>
145
+ <th><br>ORG</th>
146
+ <th><br>Other</th>
147
+ <th><br>Overall</th>
148
+ </tr>
149
+ </thead>
150
+ <tbody>
151
+ <tr>
152
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
153
+ <td><br>Precision</td>
154
+ <td><br>0,974</td>
155
+ <td><br>0,965</td>
156
+ <td><br>0,910</td>
157
+ <td><br>0,999</td>
158
+ <td><br>0,995</td>
159
+ </tr>
160
+ <tr>
161
+ <td><br>Recall</td>
162
+ <td><br>0,995</td>
163
+ <td><br>0,981</td>
164
+ <td><br>0,968</td>
165
+ <td><br>0,996</td>
166
+ <td><br>0,995</td>
167
+ </tr>
168
+ <tr>
169
+ <td>F1</td>
170
+ <td><br>0,985</td>
171
+ <td><br>0,973</td>
172
+ <td><br>0,938</td>
173
+ <td><br>0,998</td>
174
+ <td><br>0,995</td>
175
+ </tr>
176
+ <tr>
177
+ <td></td>
178
+ <td><br>Number</td>
179
+ <td><br>36,365</td>
180
+ <td><br>27,101</td>
181
+ <td><br>5,411</td>
182
+ <td><br>529,523</td>
183
+ <td><br>598,400</td>
184
+ </tr>
185
+ </tbody>
186
+ </table>
187
 
188
  ### wikiann
189
+
190
+ <table>
191
+ <thead>
192
+ <tr>
193
+ <th><br>Model</th>
194
+ <th><br>Metrics</th>
195
+ <th><br>PER</th>
196
+ <th><br>LOC</th>
197
+ <th><br>ORG</th>
198
+ <th><br>Other</th>
199
+ <th><br>Overall</th>
200
+ </tr>
201
+ </thead>
202
+ <tbody>
203
+ <tr>
204
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
205
+ <td><br>Precision</td>
206
+ <td><br>0,948</td>
207
+ <td><br>0,900</td>
208
+ <td><br>0,893</td>
209
+ <td><br>0,979</td>
210
+ <td><br>0,942</td>
211
+ </tr>
212
+ <tr>
213
+ <td><br>Recall</td>
214
+ <td><br>0,946</td>
215
+ <td><br>0,911</td>
216
+ <td><br>0,878</td>
217
+ <td><br>0,982</td>
218
+ <td><br>0,942</td>
219
+ </tr>
220
+ <tr>
221
+ <td>F1</td>
222
+ <td><br>0,947</td>
223
+ <td><br>0,906</td>
224
+ <td><br>0,886</td>
225
+ <td><br>0,980</td>
226
+ <td><br>0,942</td>
227
+ </tr>
228
+ <tr>
229
+ <td></td>
230
+ <td><br>Number</td>
231
+ <td><br>21,656</td>
232
+ <td><br>19,757</td>
233
+ <td><br>21,592</td>
234
+ <td><br>47,318</td>
235
+ <td><br>110,323</td>
236
+ </tr>
237
+ </tbody>
238
+ </table>
239
 
240
  ### wikiner
241
+
242
+ <table>
243
+ <thead>
244
+ <tr>
245
+ <th><br>Model</th>
246
+ <th><br>Metrics</th>
247
+ <th><br>PER</th>
248
+ <th><br>LOC</th>
249
+ <th><br>ORG</th>
250
+ <th><br>Other</th>
251
+ <th><br>Overall</th>
252
+ </tr>
253
+ </thead>
254
+ <tbody>
255
+ <tr>
256
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
257
+ <td><br>Precision</td>
258
+ <td><br>0,971</td>
259
+ <td><br>0,947</td>
260
+ <td><br>0,866</td>
261
+ <td><br>0,994</td>
262
+ <td><br>0,989</td>
263
+ </tr>
264
+ <tr>
265
+ <td><br>Recall</td>
266
+ <td><br>0,969</td>
267
+ <td><br>0,942</td>
268
+ <td><br>0,891</td>
269
+ <td><br>0,995</td>
270
+ <td><br>0,989</td>
271
+ </tr>
272
+ <tr>
273
+ <td>F1</td>
274
+ <td><br>0,969</td>
275
+ <td><br>0,945</td>
276
+ <td><br>0,878</td>
277
+ <td><br>0,995</td>
278
+ <td><br>0,989</td>
279
+ </tr>
280
+ <tr>
281
+ <td></td>
282
+ <td><br>Number</td>
283
+ <td><br>26,053</td>
284
+ <td><br>29,004</td>
285
+ <td><br>6,253</td>
286
+ <td><br>394,986</td>
287
+ <td><br>456,296</td>
288
+ </tr>
289
+ </tbody>
290
+ </table>
291
 
292
  ### frenchNER
293
+
294
+ <table>
295
+ <thead>
296
+ <tr>
297
+ <th><br>Model</th>
298
+ <th><br>Metrics</th>
299
+ <th><br>PER</th>
300
+ <th><br>LOC</th>
301
+ <th><br>ORG</th>
302
+ <th><br>Other</th>
303
+ <th><br>Overall</th>
304
+ </tr>
305
+ </thead>
306
+ <tbody>
307
+ <tr>
308
+ <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
309
+ <td><br>Precision</td>
310
+ <td><br>0,961</td>
311
+ <td><br>0,935</td>
312
+ <td><br>0,877</td>
313
+ <td><br>0,995</td>
314
+ <td><br>0,986</td>
315
+ </tr>
316
+ <tr>
317
+ <td><br>Recall</td>
318
+ <td><br>0,972</td>
319
+ <td><br>0,946</td>
320
+ <td><br>0,876</td>
321
+ <td><br>0,994</td>
322
+ <td><br>0,986</td>
323
+ </tr>
324
+ <tr>
325
+ <td>F1</td>
326
+ <td><br>0,966</td>
327
+ <td><br>0,940</td>
328
+ <td><br>0,876</td>
329
+ <td><br>0,994</td>
330
+ <td><br>0,986</td>
331
+ </tr>
332
+ <tr>
333
+ <td></td>
334
+ <td><br>Number</td>
335
+ <td><br>88,139</td>
336
+ <td><br>78,278</td>
337
+ <td><br>35,788</td>
338
+ <td><br>1,040,925</td>
339
+ <td><br>1,243,130</td>
340
+ </tr>
341
+ </tbody>
342
+ </table>
343
+
344
+
345
 
346
  ## Usage
347
  ### Code