noharm-ai
/

anony

Token Classification

sequence-tagger-model

Model card Files Files and versions Community

anony / README.md

hdpsantos's picture

Update README.md

dd88173 almost 3 years ago

|

history blame contribute delete

2.88 kB

	---
	license: mit
	tags:
	- flair
	- token-classification
	- sequence-tagger-model
	language: "pt"
	widget:
	- text: "FISIOTERAPIA TRAUMATO - MANHÃ Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos."
	---

	## Portuguese Name Identification

	The [NoHarm-Anony - De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) paper contains Flair-based models for Portuguese Language, initialized with [Flair BBP](https://github.com/jneto04/ner-pt) & trained on clinical notes with names tagged.

	### Demo: How to use in Flair

	Requires: [Flair](https://github.com/flairNLP/flair/) (`pip install flair`)

	```python
	from flair.data import Sentence
	from flair.models import SequenceTagger
	# load tagger
	tagger = SequenceTagger.load("noharm-ai/anony")
	# make example sentence
	sentence = Sentence("FISIOTERAPIA TRAUMATO - MANHÃ Henrique Dias, 38 anos. Exercícios metabólicos de extremidades inferiores. Realizo mobilização patelar e leve mobilização de flexão de joelho conforme liberado pelo Dr Marcelo Arocha. Oriento cuidados e posicionamentos.")
	# predict NER tags
	tagger.predict(sentence)
	# print sentence
	print(sentence)
	# print predicted NER spans
	print('The following NER tags are found:')
	# iterate over entities and print
	for entity in sentence.get_spans('ner'):
	print(entity)
	```

	This yields the following output:
	```
	Span [5,6]: "Henrique Dias" [− Labels: NOME (0.9735)]
	Span [31,32]: "Marcelo Arocha" [− Labels: NOME (0.9803)]
	```

	So, the entities "Henrique Dias" (labeled as a nome) and "Marcelo Arocha" (labeled as a nome) are found in the sentence.



	## More Information

	Refer to the original paper, [De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier](https://link.springer.com/chapter/10.1007/978-3-030-91699-2_3) for additional details and performance.

	## Acknowledgements

	We thank Dr. Ana Helena D. P. S. Ulbrich, who provided the clinical notes dataset from the hospital, for her valuable cooperation. We also thank the volunteers of the Institute of Artificial Intelligence in Healthcare Celso Pereira and Ana Lúcia Dias, for the dataset annotation.

	## Citation

	```
	@inproceedings{santos2021identification,
	title={De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier},
	author={Santos, Joaquim and dos Santos, Henrique DP and Tabalipa, F{\'a}bio and Vieira, Renata},
	booktitle={Brazilian Conference on Intelligent Systems},
	pages={33--41},
	year={2021},
	organization={Springer}
	}
	```