baukearends's picture
Update README.md
7b1117a verified
metadata
tags:
  - spacy
  - arxiv:2408.06930
  - medical
language:
  - nl
license: cc-by-sa-4.0
model-index:
  - name: Echocardiogram_SpanCategorizer_diastolic_dysfunction
    results:
      - task:
          type: token-classification
        dataset:
          type: test
          name: internal test set
        metrics:
          - name: Weighted f1
            type: f1
            value: 0.875
            verified: false
          - name: Weighted precision
            type: precision
            value: 0.902
            verified: false
          - name: Weighted recall
            type: recall
            value: 0.849
            verified: false
pipeline_tag: token-classification
metrics:
  - f1
  - precision
  - recall

Description

This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.

Minimum working example

!pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-diastolic-dysfunction/resolve/main/nl_Echocardiogram_SpanCategorizer_diastolic_dysfunction-any-py3-none-any.whl
import spacy
nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_diastolic_dysfunction")
prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.")
for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
    print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")

Label Scheme

View label scheme (4 labels for 1 components)
Component Labels
spancat lv_dias_func_normal, lv_dias_func_mild, lv_dias_func_severe, lv_dias_func_moderate

Intended use

The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.

Data

The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.

Feature Description
Name Echocardiogram_SpanCategorizer_diastolic_dysfunction
Version 1.0.0
spaCy >=3.7.4,<3.8.0
Default Pipeline tok2vec, spancat
Components tok2vec, spancat
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources n/a
License cc-by-sa-4.0
Author Bauke Arends

Contact

If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues

Usage

If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930

References

Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930