File size: 4,522 Bytes
8c92fa8
 
805d893
 
 
 
42ea19e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c92fa8
42ea19e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit

inference:
  parameters:
    aggregation_strategy: "average"

language:
  - pt
pipeline_tag: fill-mask
tags:
  - medialbertina-ptpt
  - deberta
  - portuguese
  - european portuguese
  - medical
  - clinical
  - healthcare
  - NER
  - Named Entity Recognition
  - IE
  - Information Extraction
widget:
  - text: Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os sinais vitais do utente, incluindo a pressão arterial, com leitura de 120/87 mmHg, a frequência cardíaca, de 80 batimentos por minuto, e SpO2 a 98%, foram monitorizados. Após a cirurgia o utente apresentava  dor intensa no local e inchaço no tornozelo, mas os resultados dos exames de radiografia revelaram uma recuperação satisfatória.
    example_title: Example 1
  - text: Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.
    example_title: Example 2
  - text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
    example_title: Example 3
  - text: Após as sessões de fisioterapia o paciente apresenta recuperação de mobilidade.
    example_title: Example 4
  - text: O paciente está em Quimioterapia com uma dosagem específica de Cisplatina para o tratamento do cancro do pulmão.
    example_title: Example 5
  - text: Monitorização da  Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
    example_title: Example 6
  - text: A ressonância magnética da utente revelou uma ruptura no menisco lateral do joelho.
    example_title: Example 7
  - text:  A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
---

# MediAlbertina
The first publicly available medical language models trained with real European Portuguese data.

MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.

Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m/blob/main/LICENSE).



# Model Description

MediAlbertina PT-PT 900M NER was created through domain adaptation of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
- Diagnostico
- Sintoma
- Medicamento
- Dosagem
- ProcedimentoMedico
- SinalVital
- Resultado
- Progresso
- 
MediAlbertina PT-PT 900M NER achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.

| Model                   | NER single-model | NER multi-models | Assertion Status |
|-------------------------|:----------------:|:----------------:|:----------------:|
|                         |    F1-score      |    F1-score      |    F1-score      |
|albertina-900m-portuguese-ptpt-encoder         |      0.813       |      0.811       |      0.687       |
| **medialbertina_pt-pt_900m** |    **0.832**     |    **0.848**     |    **0.755**     |

## Data

MediAlbertina PT-PT 900M NER was fine-tuned on more than 10k hand-annotated entities from more than a thousand fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).


## How to use

```Python
from transformers import pipeline

ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
entities = ner_pipeline(sentence)
for entity in entities:
  print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
```

## Citation

MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:

```latex
In publishing process. Reference will be added soon.
```
Please use the above cannonical reference when using or citing this model.