CATIE-AQ
/

NERmembert-base-3entities

Token Classification

Transformers

TensorBoard

Safetensors

French

camembert

Model card Files Files and versions Metrics Training metrics Community

bourdoiscatie commited on Jan 4, 2024

Commit

3710092

1 Parent(s): 2df0e90

Update README.md

Browse files

Files changed (1) hide show

README.md +67 -65

README.md CHANGED Viewed

@@ -7,10 +7,10 @@ metrics:
 - f1
 - accuracy
 model-index:
-- name: Camembert-NER-base-frenchNER
   results: []
 datasets:
-- CATIE-AQ/frenchNER
 language:
 - fr
 widget:
@@ -25,8 +25,8 @@ co2_eq_emissions: 35
 ## Model Description
-We present **Camembert-NER-base-frenchNER**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
-All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER).
 This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
 Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
@@ -34,7 +34,7 @@ Our methodology is described in a blog post available in [English](https://blog.
 ## Dataset
-The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER), which represents ~420k sentences labeled in 4 categories :
 * PER: personality ;
 * LOC: location ;
 * ORG: organization ;
@@ -81,6 +81,61 @@ The distribution of the entities is as follows:
 The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
 ### multiconer
 <table>
@@ -91,7 +146,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
-      <th><br>Other</th>
       <th><br>Overall</th>
     </tr>
 </thead>
@@ -143,7 +198,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
-      <th><br>Other</th>
       <th><br>Overall</th>
     </tr>
 </thead>
@@ -195,7 +250,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
-      <th><br>Other</th>
       <th><br>Overall</th>
     </tr>
 </thead>
@@ -247,7 +302,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
-      <th><br>Other</th>
       <th><br>Overall</th>
     </tr>
 </thead>
@@ -289,59 +344,6 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
 </tbody>
 </table>
-### frenchNER
-<table>
-<thead>
-    <tr>
-      <th><br>Model</th>
-      <th><br>Metrics</th>
-      <th><br>PER</th>
-      <th><br>LOC</th>
-      <th><br>ORG</th>
-      <th><br>Other</th>
-      <th><br>Overall</th>
-    </tr>
-</thead>
-<tbody>
-    <tr>
-        <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
-        <td><br>Precision</td>
-        <td><br>0,961</td>
-        <td><br>0,935</td>
-        <td><br>0,877</td>
-        <td><br>0,995</td>
-        <td><br>0,986</td>
-    </tr>
-    <tr>
-        <td><br>Recall</td>
-        <td><br>0,972</td>
-        <td><br>0,946</td>
-        <td><br>0,876</td>
-        <td><br>0,994</td>
-        <td><br>0,986</td>
-    </tr>
-    <tr>
-        <td>F1</td>
-        <td><br>0,966</td>
-        <td><br>0,940</td>
-        <td><br>0,876</td>
-        <td><br>0,994</td>
-        <td><br>0,986</td>
-    </tr>
-    <tr>
-        <td></td>
-        <td><br>Number</td>
-        <td><br>88,139</td>
-        <td><br>78,278</td>
-        <td><br>35,788</td>
-        <td><br>1,040,925</td>
-        <td><br>1,243,130</td>
-    </tr>
-</tbody>
-</table>
 ## Usage
 ### Code
@@ -349,7 +351,7 @@ The evaluation was carried out using the [**evaluate**](https://pypi.org/project
 ```python
 from transformers import pipeline
-ner = pipeline('question-answering', model='CATIE-AQ/Camembert-NER-base-frenchNER', tokenizer='CATIE-AQ/Camembert-NER-base-frenchNER', grouped_entities=True)
 result = ner(
 "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
@@ -470,7 +472,7 @@ The following hyperparameters were used during training:
 ## Citations
-### Camembert-NER-frenchNER
 ```
 TODO
 ```
@@ -543,7 +545,7 @@ url = {https://www.sciencedirect.com/science/article/pii/S0004370212000276},
 author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
-### frenchNER
 ```
 TODO
 ```

 - f1
 - accuracy
 model-index:
+- name: Camembert-base-frenchNER_3entities
   results: []
 datasets:
+- CATIE-AQ/frenchNER_3entities
 language:
 - fr
 widget:
 ## Model Description
+We present **Camembert-base-frenchNER_3entities**, which is a [CamemBERT base](https://huggingface.co/camembert-base) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG).
+All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities).
 This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.**
 Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/).
 ## Dataset
+The dataset used is [frenchNER](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities), which represents ~420k sentences labeled in 4 categories :
 * PER: personality ;
 * LOC: location ;
 * ORG: organization ;
 The evaluation was carried out using the [**evaluate**](https://pypi.org/project/evaluate/) python package.
+### frenchNER_3entities
+<table>
+<thead>
+    <tr>
+      <th><br>Model</th>
+      <th><br>Metrics</th>
+      <th><br>PER</th>
+      <th><br>LOC</th>
+      <th><br>ORG</th>
+      <th><br>O</th>
+      <th><br>Overall</th>
+    </tr>
+</thead>
+<tbody>
+    <tr>
+        <td rowspan="3"><br>Camembert-base-frenchNER_3entities</td>
+        <td><br>Precision</td>
+        <td><br>0,961</td>
+        <td><br>0,935</td>
+        <td><br>0,877</td>
+        <td><br>0,995</td>
+        <td><br>0,986</td>
+    </tr>
+    <tr>
+        <td><br>Recall</td>
+        <td><br>0,972</td>
+        <td><br>0,946</td>
+        <td><br>0,876</td>
+        <td><br>0,994</td>
+        <td><br>0,986</td>
+    </tr>
+    <tr>
+        <td>F1</td>
+        <td><br>0,966</td>
+        <td><br>0,940</td>
+        <td><br>0,876</td>
+        <td><br>0,994</td>
+        <td><br>0,986</td>
+    </tr>
+    <tr>
+        <td></td>
+        <td><br>Number</td>
+        <td><br>88,139</td>
+        <td><br>78,278</td>
+        <td><br>35,788</td>
+        <td><br>1,040,925</td>
+        <td><br>1,243,130</td>
+    </tr>
+</tbody>
+</table>
+In detail:
 ### multiconer
 <table>
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
+      <th><br>O</th>
       <th><br>Overall</th>
     </tr>
 </thead>
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
+      <th><br>O</th>
       <th><br>Overall</th>
     </tr>
 </thead>
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
+      <th><br>O</th>
       <th><br>Overall</th>
     </tr>
 </thead>
       <th><br>PER</th>
       <th><br>LOC</th>
       <th><br>ORG</th>
+      <th><br>O</th>
       <th><br>Overall</th>
     </tr>
 </thead>
 </tbody>
 </table>
 ## Usage
 ### Code
 ```python
 from transformers import pipeline
+ner = pipeline('question-answering', model='CATIE-AQ/Camembert-base-frenchNER_3entities', tokenizer='CATIE-AQ/Camembert-base-frenchNER_3entities', grouped_entities=True)
 result = ner(
 "Assurés de disputer l'Euro 2024 en Allemagne l'été prochain (du 14 juin au 14 juillet) depuis leur victoire aux Pays-Bas, les Bleus ont fait le nécessaire pour avoir des certitudes. Avec six victoires en six matchs officiels et un seul but encaissé, Didier Deschamps a consolidé les acquis de la dernière Coupe du monde. Les joueurs clés sont connus : Kylian Mbappé, Aurélien Tchouameni, Antoine Griezmann, Ibrahima Konaté ou encore Mike Maignan."
 ## Citations
+### Camembert-frenchNER_3entities
 ```
 TODO
 ```
 author = {Joel Nothman and Nicky Ringland and Will Radford and Tara Murphy and James R. Curran}}
+### frenchNER_3entities
 ```
 TODO
 ```