This is a flair sequence tagger trained with a corpus of 127 case reports from the European Court of Human Rights (ECHR) in Spanish (using pre-trained embeddings from the flair/ner-multi model).
This corpus was built and annotated for anonymization as part of the work presented in the Master's thesis "Anonymization of case reports from the ECHR in Spanish and French: exploration of two alternative annotation approaches".
The annotation was carried out by projecting the annotations of the test set of the English corpus built by Pilán et al. (2022).
It predicts 8 tags: DATETIME, CODE, PER, DEM, MISC, ORG, LOC, QUANTITY.
The corpus and the code used for training this sequence tagger are available on GitHub: https://github.com/mariasierro/automatic-anonymization-ECHR-French-Spanish.
References
Pilán, I., Lison, P., Ovrelid, L., Papadopoulou, A., Sánchez, D. & Batet, M. (2022). The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization. In Computational Linguistics, 48(4), pp. 1053–1101. Cambridge, MA: MIT Press. doi: 10.1162/coli_a_00458.
- Downloads last month
- 26