|
---
|
|
license: mit
|
|
tags:
|
|
- flair
|
|
- token-classification
|
|
- sequence-tagger-model
|
|
language: "pt"
|
|
widget:
|
|
- text: "FISIOTERAPIA TRAUMATO - MANHÃ Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos."
|
|
---
|
|
|
|
## Portuguese Name Identification
|
|
|
|
The [NoHarm-Anony - De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) paper contains Flair-based models for Portuguese Language, initialized with [Flair BBP](https://github.com/jneto04/ner-pt) & trained on clinical notes with names tagged.
|
|
|
|
### Demo: How to use in Flair
|
|
|
|
Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
|
|
|
|
```python
|
|
from flair.data import Sentence
|
|
from flair.models import SequenceTagger
|
|
# load tagger
|
|
tagger = SequenceTagger.load("noharm-ai/anony")
|
|
# make example sentence
|
|
sentence = Sentence("FISIOTERAPIA TRAUMATO - MANHÃ Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos.")
|
|
# predict NER tags
|
|
tagger.predict(sentence)
|
|
# print sentence
|
|
print(sentence)
|
|
# print predicted NER spans
|
|
print('The following NER tags are found:')
|
|
# iterate over entities and print
|
|
for entity in sentence.get_spans('ner'):
|
|
print(entity)
|
|
```
|
|
|
|
This yields the following output:
|
|
```
|
|
Span [5,6]: "Henrique Dias" [− Labels: NOME (0.9735)]
|
|
Span [31,32]: "Marcelo Arocha" [− Labels: NOME (0.9803)]
|
|
```
|
|
|
|
So, the entities "*Henrique Dias*" (labeled as a **nome**) and "*Marcelo Arocha*" (labeled as a **nome**) are found in the sentence.
|
|
|
|
|
|
|
|
## More Information
|
|
|
|
Refer to the original paper, [De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) for additional details and performance.
|
|
|
|
## Acknowledgements
|
|
|
|
We thank Dr. Ana Helena D. P. S. Ulbrich, who provided the clinical notes dataset from the hospital, for her valuable cooperation. We also thank the volunteers of the Institute of Artificial Intelligence in Healthcare Celso Pereira and Ana Lúcia Dias, for the dataset annotation.
|
|
|
|
## Citation
|
|
|
|
```
|
|
@inproceedings{santos2021identification,
|
|
title={De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier},
|
|
author={Santos, Joaquim and dos Santos, Henrique DP and Tabalipa, F{\'a}bio and Vieira, Renata},
|
|
booktitle={Brazilian Conference on Intelligent Systems},
|
|
pages={33--41},
|
|
year={2021},
|
|
organization={Springer}
|
|
}
|
|
``` |