HueyNemud commited on
Commit
0363e8e
1 Parent(s): 9e621d7

Better README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -22
README.md CHANGED
@@ -1,33 +1,30 @@
1
- ---
2
- tags:
3
- - generated_from_trainer
4
- model-index:
5
- - name: model_pretrained
6
- results: []
7
- ---
8
 
9
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
11
 
12
- # model_pretrained
13
 
14
- This model is a fine-tuned version of [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
- - Loss: 1.5619
17
 
18
  ## Model description
19
-
20
- More information needed
 
 
 
21
 
22
  ## Intended uses & limitations
 
 
 
 
 
23
 
24
- More information needed
25
-
26
- ## Training and evaluation data
27
-
28
- More information needed
29
-
30
- ## Training procedure
31
 
32
  ### Training hyperparameters
33
 
@@ -55,3 +52,4 @@ The following hyperparameters were used during training:
55
  - Pytorch 1.10.1+cu102
56
  - Datasets 1.17.0
57
  - Tokenizers 0.10.3
 
 
1
+ # CamemBERT pretrained on french trade directories from the XIXth century
 
 
 
 
 
 
2
 
3
+ This mdoel is part of the material of the paper
4
+ > Abadie, N., Carlinet, E., Chazalon, J., Duménieu, B. (2022). A
5
+ > Benchmark of Named Entity Recognition Approaches in Historical
6
+ > Documents Application to 19𝑡ℎ Century French Directories. In: Uchida,
7
+ > S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022.
8
+ > Lecture Notes in Computer Science, vol 13237. Springer, Cham.
9
+ > https://doi.org/10.1007/978-3-031-06555-2_30
10
 
11
+ The source code to train this model is available on the [GitHub repository](https://github.com/soduco/paper-ner-bench-das22) of the paper as a Jupyter notebook in `src/ner/10-camembert_pretraining.ipynb`.
12
 
 
 
 
13
 
14
  ## Model description
15
+ This model pre-train the model [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) on a set of ~845k entries from Paris trade directories from the XIXth century extracted with OCR.
16
+ Trade directory entries are short and strongly structured texts that giving the name, activity and location of a person or business, e.g:
17
+ ```
18
+ Peynaud, R. de la Vieille Bouclerie, 18. Richard, Joullain et comp., (commission- —Phéâtre Français. naire, (entrepôt), au port de la Rapée-
19
+ ```
20
 
21
  ## Intended uses & limitations
22
+ This model is intended for reproducibility of the NER evaluation published in the DAS2022 paper.
23
+ Several derived models trained for NER on trade directories are available on HuggingFace, each trained on a different dataset :
24
+ - [das22-10-camembert_pretrained_finetuned_ref](): trained for NER on ~6000 directory entries manually corrected.
25
+ - [das22-10-camembert_pretrained_finetuned_pero](): trained for NER on ~6000 directory entries extracted with PERO-OCR.
26
+ - [das22-10-camembert_pretrained_finetuned_tess](): trained for NER on ~6000 directory entries extracted with Tesseract.
27
 
 
 
 
 
 
 
 
28
 
29
  ### Training hyperparameters
30
 
 
52
  - Pytorch 1.10.1+cu102
53
  - Datasets 1.17.0
54
  - Tokenizers 0.10.3
55
+