Quinten Datalab commited on
Commit
5e80cf1
1 Parent(s): 191d0ac

Update README.md

Browse files

NER results fo QUAERO dataset updated

Files changed (1) hide show
  1. README.md +49 -4
README.md CHANGED
@@ -30,9 +30,17 @@ widget:
30
  AliBERT: is a pre-trained language model for French biomedical text. It is trained with masked language model like RoBERTa.
31
 
32
  Here are the main contributions of our work:
33
- A French biomedical language model, a language-specific and domain-specific PLM, which can be used to represent French biomedical text for different downstream tasks.
34
- A normalization of a Unigram sub-word tokenization of French biomedical textual input which improves our vocabulary and overall performance of the models trained.
35
- It is a foundation model that achieved state-of-the-art results on French biomedical text.
 
 
 
 
 
 
 
 
36
 
37
  The Paper can be found here: https://aclanthology.org/2023.bionlp-1.19/
38
 
@@ -147,6 +155,43 @@ The model is evaluated on two (CAS and QUAERO) publically available Frech biomed
147
  </tr>
148
  </tbody>
149
  </table>
150
- *Table 2: NER performances on CAS*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
 
152
  ##AliBERT: A Pre-trained Language Model for French Biomedical Text
 
30
  AliBERT: is a pre-trained language model for French biomedical text. It is trained with masked language model like RoBERTa.
31
 
32
  Here are the main contributions of our work:
33
+ <ul>
34
+ <li>
35
+ A French biomedical language model, a language-specific and domain-specific PLM, which can be used to represent French biomedical text for different downstream tasks.
36
+ </li>
37
+ <li>
38
+ A normalization of a Unigram sub-word tokenization of French biomedical textual input which improves our vocabulary and overall performance of the models trained.
39
+ </li>
40
+ <li>
41
+ It is a foundation model that achieved state-of-the-art results on French biomedical text.
42
+ </li>
43
+ </ul>
44
 
45
  The Paper can be found here: https://aclanthology.org/2023.bionlp-1.19/
46
 
 
155
  </tr>
156
  </tbody>
157
  </table>
158
+ Table 2: NER performances on CAS dataset
159
+
160
+ #### QUAERO dataset
161
+
162
+ <table class="tg">
163
+ <thead>
164
+ <tr>
165
+ <th>Models</th>
166
+ <th class="tg-0lax" colspan="3">CamemBERT</th>
167
+ <th class="tg-0lax" colspan="3">AliBERT</th>
168
+ <th class="tg-0lax" colspan="3">DrBERT</th>
169
+ </tr>
170
+ </thead>
171
+ <tbody>
172
+ <tr>
173
+ <td>Entity </td> <td> P </td> <td> R </td> <td> F1 </td> <td> P </td> <td> R </td> <td> F1 </td> <td> P </td> <td> R </td> <td> F1 </td>
174
+ </tr>
175
+ <tr>
176
+ <td>Anatomy </td> <td> 0.649 </td> <td> 0.641 </td> <td> 0.645 </td> <td> 0.795 </td> <td> 0.811 </td> <td> 0.803 </td> <td> 0.799 </td> <td> 0.801 </td> <td> 0.800 </td>
177
+ </tr>
178
+ <tr>
179
+ <td>Chemical </td> <td> 0.844 </td> <td> 0.847 </td> <td> 0.846 </td> <td> 0.878 </td> <td> 0.893 </td> <td> 0.885 </td> <td> 0.898 </td> <td> 0.818 </td> <td> 0.856 </td>
180
+ </tr>
181
+ <tr>
182
+ <td>Device </td> <td> 0.000 </td> <td> 0.000 </td> <td> 0.000 </td> <td> 0.506 </td> <td> 0.356 </td> <td> 0.418 </td> <td> 0.549 </td> <td> 0.338 </td> <td> 0.419} </td>
183
+ </tr>
184
+ <tr>
185
+ <td>Disorder </td> <td> 0.772 </td> <td> 0.818 </td> <td> 0.794 </td> <td> 0.857 </td> <td> 0.843 </td> <td> 0.850 </td> <td> 0.883 </td> <td> 0.809 </td> <td> 0.845 </td>
186
+ </tr>
187
+ <tr>
188
+ <td>Procedure </td> <td> 0.880 </td> <td> 0.894 </td> <td> 0.887 </td> <td> 0.969 </td> <td> 0.967 </td> <td> 0.968 </td> <td> 0.944 </td> <td> 0.976 </td> <td> 0.960 </td>
189
+ </tr>
190
+ <tr>
191
+ <td>Macro Avg </td> <td> 0.655 </td> <td> 0.656 </td> <td> 0.655 </td> <td> 0.807 </td> <td> 0.783 </td> <td> 0.793 </td> <td> 0.818 </td> <td> 0.755 </td> <td> 0.782 </td>
192
+ </tr>
193
+ </tbody>
194
+ </table>
195
+ Table 3: NER performances on QUAERO dataset
196
 
197
  ##AliBERT: A Pre-trained Language Model for French Biomedical Text