File size: 2,877 Bytes
0e62701
 
5f3fdae
 
 
 
 
 
 
0e62701
8436f29
5f3fdae
 
 
 
6484741
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dd88173
 
 
 
 
 
 
6484741
 
 
5f3fdae
 
 
 
 
 
 
8436f29
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---

license: mit
tags:
- flair
- token-classification
- sequence-tagger-model
language: "pt"
widget:
- text: "FISIOTERAPIA TRAUMATO - MANHÃ  Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos."
---


## Portuguese Name Identification

The [NoHarm-Anony - De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) paper contains Flair-based models for Portuguese Language, initialized with [Flair BBP](https://github.com/jneto04/ner-pt) & trained on clinical notes with names tagged. 

### Demo: How to use in Flair

Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)

```python

from flair.data import Sentence

from flair.models import SequenceTagger

# load tagger

tagger = SequenceTagger.load("noharm-ai/anony")

# make example sentence

sentence = Sentence("FISIOTERAPIA TRAUMATO - MANHÃ  Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos.")

# predict NER tags

tagger.predict(sentence)

# print sentence

print(sentence)

# print predicted NER spans

print('The following NER tags are found:')

# iterate over entities and print

for entity in sentence.get_spans('ner'):

    print(entity)

```

This yields the following output:
```

Span [5,6]: "Henrique Dias"   [− Labels: NOME (0.9735)]

Span [31,32]: "Marcelo Arocha"   [− Labels: NOME (0.9803)]

```

So, the entities "*Henrique Dias*" (labeled as a **nome**) and "*Marcelo Arocha*" (labeled as a **nome**) are found in the sentence. 



## More Information

Refer to the original paper, [De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) for additional details and performance.

## Acknowledgements

We thank Dr. Ana Helena D. P. S. Ulbrich, who provided the clinical notes dataset from the hospital, for her valuable cooperation. We also thank the volunteers of the Institute of Artificial Intelligence in Healthcare Celso Pereira and Ana Lúcia Dias, for the dataset annotation.

## Citation

```

@inproceedings{santos2021identification,

  title={De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier},

  author={Santos, Joaquim and dos Santos, Henrique DP and Tabalipa, F{\'a}bio and Vieira, Renata},

  booktitle={Brazilian Conference on Intelligent Systems},

  pages={33--41},

  year={2021},

  organization={Springer}

}

```