anony / README.md
hdpsantos's picture
Update README.md
dd88173
metadata
license: mit
tags:
  - flair
  - token-classification
  - sequence-tagger-model
language: pt
widget:
  - text: >-
      FISIOTERAPIA TRAUMATO - MANHÃ  Henrique Dias, 38 anos. Exercícios
      metabólicos de extremidades inferiores. Realizo mobilização patelar e leve
      mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha.
      Oriento cuidados e posicionamentos.

Portuguese Name Identification

The NoHarm-Anony - De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier paper contains Flair-based models for Portuguese Language, initialized with Flair BBP & trained on clinical notes with names tagged.

Demo: How to use in Flair

Requires: Flair (pip install flair)

from flair.data import Sentence
from flair.models import SequenceTagger
# load tagger
tagger = SequenceTagger.load("noharm-ai/anony")
# make example sentence
sentence = Sentence("FISIOTERAPIA TRAUMATO - MANHÃ  Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos.")
# predict NER tags
tagger.predict(sentence)
# print sentence
print(sentence)
# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

This yields the following output:

Span [5,6]: "Henrique Dias"   [− Labels: NOME (0.9735)]
Span [31,32]: "Marcelo Arocha"   [− Labels: NOME (0.9803)]

So, the entities "Henrique Dias" (labeled as a nome) and "Marcelo Arocha" (labeled as a nome) are found in the sentence.

More Information

Refer to the original paper, De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier for additional details and performance.

Acknowledgements

We thank Dr. Ana Helena D. P. S. Ulbrich, who provided the clinical notes dataset from the hospital, for her valuable cooperation. We also thank the volunteers of the Institute of Artificial Intelligence in Healthcare Celso Pereira and Ana Lúcia Dias, for the dataset annotation.

Citation

@inproceedings{santos2021identification,
  title={De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier},
  author={Santos, Joaquim and dos Santos, Henrique DP and Tabalipa, F{\'a}bio and Vieira, Renata},
  booktitle={Brazilian Conference on Intelligent Systems},
  pages={33--41},
  year={2021},
  organization={Springer}
}