Upload README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,67 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: nl
|
3 |
license: mit
|
4 |
+
pipeline_tag: text-classification
|
5 |
+
inference: false
|
6 |
---
|
7 |
+
|
8 |
+
# A-PROOF Binary Sentence Classification
|
9 |
+
|
10 |
+
## Description
|
11 |
+
|
12 |
+
A fine-tuned binary text classification model that determines whether a sentence is relevant for WHO-ICF category classification.
|
13 |
+
|
14 |
+
Since 95% of the sentences in clinical notes is not relevant for ICF classification, it makes sense to filter the relevant sentences before applying other classification processes. Using the binary classification, the processing of large volumes of data can be optimised as only 5% of the sentences needs to be classified for the level of functioning.
|
15 |
+
For further classification of relevant sentences, you can use the multilabel classifier: https://huggingface.co/CLTL/icf-domains and the any of the relevant regression classifiers for obtaining a level score.
|
16 |
+
|
17 |
+
Relevant sentences are likely to be express patient's functioning for the following 9 ICF categories:
|
18 |
+
|
19 |
+
ICF code | Domain | name in repo
|
20 |
+
---|---|---
|
21 |
+
b440 | Respiration functions | ADM
|
22 |
+
b140 | Attention functions | ATT
|
23 |
+
d840-d859 | Work and employment | BER
|
24 |
+
b1300 | Energy level | ENR
|
25 |
+
d550 | Eating | ETN
|
26 |
+
d450 | Walking | FAC
|
27 |
+
b455 | Exercise tolerance functions | INS
|
28 |
+
b530 | Weight maintenance functions | MBW
|
29 |
+
b152 | Emotional functions | STM
|
30 |
+
|
31 |
+
|
32 |
+
## Intended use and limitations
|
33 |
+
- The model was fine-tuned (trained, validated and tested) on medical records from the Amsterdam UMC (the two academic medical centers of Amsterdam). It might perform differently on text from a different hospital or text from non-hospital sources (e.g. GP records).
|
34 |
+
- The model only distinguishes sentences on the basis of the 9 ICF categories.
|
35 |
+
|
36 |
+
## How to use
|
37 |
+
To generate predictions with the model, use the [Transformers](https://huggingface.co/docs/transformers) library:
|
38 |
+
|
39 |
+
```
|
40 |
+
# Use a pipeline as a high-level helper
|
41 |
+
from transformers import pipeline
|
42 |
+
pipe = pipeline('text-classification', model='CLTL/binary_icf_classifier')
|
43 |
+
result = pipe('De patient is erg moe')
|
44 |
+
print(result)
|
45 |
+
[{'label': 'pos', 'score': 0.9977788329124451}]
|
46 |
+
```
|
47 |
+
|
48 |
+
```
|
49 |
+
# load model directly
|
50 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
51 |
+
tokenizer = AutoTokenizer. from_pretrained ('CLTL/binary_icf_classifier')
|
52 |
+
model = AutoModelForSequenceClassification.from_pretrained('CLTL/binary_icf_classifier')
|
53 |
+
```
|
54 |
+
|
55 |
+
## Training data
|
56 |
+
- The training data consists of clinical notes from medical records (in Dutch) of the Amsterdam UMC. Due to privacy constraints, the data cannot be released.
|
57 |
+
- The annotation guidelines used for the project can be found [here](https://github.com/cltl/a-proof-zonmw/tree/main/resources/annotation_guidelines).
|
58 |
+
|
59 |
+
## Evaluation results
|
60 |
+
The evaluation is done on a sentence-level (the classification unit): .97 precision, .96 recall, .97 f1.
|
61 |
+
|
62 |
+
## Contact
|
63 |
+
Piek Vossen, piek.vossen@vu.nl
|
64 |
+
|
65 |
+
## References
|
66 |
+
https://github.com/cltl-students/Cecilia_Kuan_data_augmentation
|
67 |
+
Cecilia Kuan, 2023, Generative Approach of Data Augmentation for Pre-Trained Clinical NLP System, MA Thesis, Vrije Universiteit Amsterdam
|