File size: 1,514 Bytes
3a0b741
 
 
 
 
 
 
 
 
 
 
 
 
987f8ac
 
cadfdc6
3a25617
 
 
 
60f685d
 
3a25617
 
 
 
 
 
 
 
60f685d
3a25617
 
60f685d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: cc-by-nc-nd-4.0
datasets:
- taln-ls2n/Adminset
language:
- fr
library_name: transformers
tags:
- camembert
- BERT
- Administrative documents
---

# AdminBERT 4GB: A Small French Language model adapted to Administrative documents

[AdminBERT-4GB](example) is a French language model adapted on a large corpus of 10 millions French administrative texts. It is a derivative of CamemBERT model, which is based on the RoBERTa architecture. AdminBERT-4GB is trained using the Whole Word Masking (WWM) objective with 30% mask rate for 2 epochs on 8 V100 GPUs. The dataset used for training is a sample of [Adminset](https://huggingface.co/datasets/taln-ls2n/Adminset).


## Evaluation

Regarding the fact that at date, there was no evaluation coprus available compose of French administrative, we decide to create our own on the NER (Named Entity Recognition) task.

### Model Performance

| Model                  | P (%)   | R (%)   | F1 (%)  |
|------------------------|---------|---------|---------|
| Wikineural-NER FT      | 77.49   | 75.40   | 75.70   |
| NERmemBERT-Large FT    | 77.43   | 78.38   | 77.13   |
| CamemBERT FT           | 77.62   | 79.59   | 77.26   |
| NERmemBERT-Base FT     | 77.99   | 79.59   | 78.34   |
| AdminBERT-NER 4G      | 78.47   | 80.35   | 79.26   |
| AdminBERT-NER 16GB     | 78.79   | 82.07   | 80.11   |

To evaluate each model, we performed five runs and averaged the results on the test set of [Adminset-NER](https://huggingface.co/datasets/taln-ls2n/Adminset-NER).