File size: 3,400 Bytes
15089a2
4fe84d1
15089a2
 
 
 
0d14f0f
 
4120586
 
02cc180
46f59e9
0d14f0f
 
 
 
63b81ed
 
d853b68
4120586
fff6999
e694074
259af70
63b81ed
d853b68
d33b02b
16384fe
d33b02b
073d423
d853b68
876e16b
d853b68
 
 
 
 
 
 
 
073d423
d853b68
5fddb87
d853b68
 
 
 
 
 
16384fe
d853b68
 
 
073d423
0f01b21
073d423
 
 
 
 
 
0d14f0f
 
 
ea00253
15089a2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
license: cc-by-nc-4.0
library_name: transformers
tags:
- medical
---
# HuBERT-ECG: A Self-Supervised Foundation Model for Broad and Scalable Cardiac Application

Original code at (https://github.com/Edoar-do/HuBERT-ECG)

License: CC BY-NC 4.0


## Abstract
Deep learning models have shown remarkable performance in electrocardiogram (ECG) analysis, but their success has been constrained by the limited availability and size of ECG datasets, resulting in systems that are more task specialists than versatile generalists. In this work, we introduce HuBERT-ECG, a foundation ECG model pre-trained in a self-supervised manner on a large and diverse dataset of 9.1 million 12-lead ECGs encompassing 164 cardiovascular conditions. By simply adding an output layer, HuBERT-ECG can be fine-tuned for a wide array of downstream tasks, from diagnosing diseases to predicting future cardiovascular events. Across diverse real-world scenarios, HuBERT-ECG achieves AUROCs from 84.3% in low-data settings to 99% in large-scale setups. When trained to detect 164 overlapping conditions simultaneously, our model delivers AUROCs above 90% and 95% for 140 and 94 diseases, respectively. HuBERT-ECG also predicts death events within a 2-year follow-up with an AUROC of 93.4%. We release models and code.

## Models
This repository contains:
- SMALL/BASE/LARGE HuBERTECG model sizes ready to be fine-tuned on any downstream dataset or to be used as feature extractor
- SMALL/BASE/LARGE HuBERTECG model sizes fine-tuned on Cardio-Learning for a more disease-oriented baseline to futher fine-tune.

Cardio-Learning is the name we gave to the union of several 12-lead ECG datasets including PTB, PTB-XL, CPSC, CPSC-Extra, Georgia, Chapman, Ningbo, SPH, CODE, SaMi-Trop, Hefei.
This dataset, counting 2.4 million ECGs from millions of patients in 4 countries, encompasses 164 different heart-related conditions for which the ECG is either the primary or a supportive diagnostic tool, or is used to estimate the risk of future adverse cardiovascular events.

## Usage

**Input signals must be 5-second 12-lead ECGs sampled at 100 HZ. The leads are concatenated to each other**

```python
import torch
from hubert_ecg import HuBERTECG, HuBERTECGConfig

path = "path/to/your/hubert-ecg-model.pt"
checkpoint = torch.load(path, map_location='cpu')
config = checkpoint['model_config']
hubert_ecg = HuBERTECG(config)
hubert_ecg.load_state_dict(checkpoint['model_state_dict']) # pre-trained model ready to be fine-tuned or used as feature extractor
```

```python
import torch
from hubert_ecg import HuBERTECG, HuBERTECGConfig
from hubert_ecg_classification import HuBERTForECGClassification

path = "path/to/your/finetuned-hubert-ecg-model.pt"
checkpoint = torch.load(path, map_location='cpu')
config = checkpoint['model_config']
hubert_ecg = HuBERTECG(config)
hubert_ecg = HuBERTForECGClassification(hubert_ecg)
hubert_ecg.load_state_dict(checkpoint['model_state_dict']) # fine-tuned model ready to be used or further fine-tuned
```

## Easier usage
(for pre-trained models only)
```python
from transformers import AutoModel
size = 'small' # any size from small, base, large
hubert_ecg = AutoModel.from_pretrained(f"Edoardo-BS/hubert-ecg-{size}", trust_remote_code=True)
```

## 📚 Citation
If you use our models or find our work useful, please consider citing us:
```
doi: https://doi.org/10.1101/2024.11.14.24317328
```