baukearends
/

Echocardiogram-SpanCategorizer-lv-dil

Token Classification

Model card Files Files and versions Community

Echocardiogram-SpanCategorizer-lv-dil / README.md

baukearends's picture

Update README.md

e9ec27b verified 3 months ago

|

3.19 kB

	---
	tags:
	- spacy
	- arxiv:2408.06930
	- medical
	language:
	- nl
	license: cc-by-sa-4.0
	model-index:
	- name: Echocardiogram_SpanCategorizer_lv_dil
	results:
	- task:
	type: token-classification
	dataset:
	type: test
	name: "internal test set"
	metrics:
	- name: "Weighted f1"
	type: f1
	value: 0.836
	verified: false
	- name: "Weighted precision"
	type: precision
	value: 0.850
	verified: false
	- name: "Weighted recall"
	type: recall
	value: 0.823
	verified: false

	pipeline_tag: token-classification
	metrics:
	- f1
	- precision
	- recall
	---

	# Description
	This model is a spaCy SpanCategorizer model trained from scratch on Dutch echocardiogram reports sourced from Electronic Health Records. The publication associated with the span classification task can be found at https://arxiv.org/abs/2408.06930. The config file for training the model can be found at https://github.com/umcu/echolabeler.

	# Minimum working example
	```python
	!pip install https://huggingface.co/baukearends/Echocardiogram-SpanCategorizer-lv-dil/resolve/main/nl_Echocardiogram_SpanCategorizer_lv_dil-any-py3-none-any.whl
	```
	```python
	import spacy
	nlp = spacy.load("nl_Echocardiogram_SpanCategorizer_lv_dil")
	```
	```python
	prediction = nlp("Op dit echo geen duidelijke WMA te zien, goede systolische L.V. functie, normale dimensies LV, wel L.V.H., diastolische dysfunctie graad 1A tot 2. Geringe aortastenose en - matige -insufficientie. Geringe M.I.")
	for span, score in zip(prediction.spans['sc'], prediction.spans['sc'].attrs['scores']):
	print(f"Span: {span}, label: {span.label_}, score: {score[0]:.3f}")
	```

	# Label Scheme

	<details>

	<summary>View label scheme (5 labels for 1 components)</summary>

	\| Component \| Labels \|
	\| --- \| --- \|
	\| `spancat` \| `lv_dil_normal`, `lv_dil_mild`, `lv_dil_moderate`, `lv_dil_present`, `lv_dil_severe` \|

	</details>


	# Intended use
	The model is developed for span classification on Dutch clinical text. Since it is a domain-specific model trained on medical data, it is meant to be used on medical NLP tasks for Dutch.

	# Data
	The model was trained on approximately 4,000 manually annotated echocardiogram reports from the University Medical Centre Utrecht. The training data was anonymized before starting the training procedure.

	\| Feature \| Description \|
	\| --- \| --- \|
	\| Name \| `Echocardiogram_SpanCategorizer_lv_dil` \|
	\| Version \| `1.0.0` \|
	\| spaCy \| `>=3.7.4,<3.8.0` \|
	\| Default Pipeline \| `tok2vec`, `spancat` \|
	\| Components \| `tok2vec`, `spancat` \|
	\| License \| `cc-by-sa-4.0` \|
	\| Author \| [Bauke Arends]() \|

	# Contact
	If you are having problems with this model please add an issue on our git: https://github.com/umcu/echolabeler/issues

	# Usage
	If you use the model in your work please use the following referral; https://doi.org/10.48550/arXiv.2408.06930

	# References
	Paper: Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es (2024): Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification, Arxiv https://arxiv.org/abs/2408.06930