Upload README.md
Browse files
README.md
CHANGED
@@ -1,4 +1,7 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
library_name: span-marker
|
3 |
tags:
|
4 |
- span-marker
|
@@ -6,35 +9,133 @@ tags:
|
|
6 |
- ner
|
7 |
- named-entity-recognition
|
8 |
- generated_from_span_marker_trainer
|
|
|
|
|
9 |
metrics:
|
10 |
- precision
|
11 |
- recall
|
12 |
- f1
|
13 |
-
widget:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
pipeline_tag: token-classification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
-
# SpanMarker
|
18 |
|
19 |
-
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.
|
20 |
|
21 |
## Model Details
|
22 |
|
23 |
### Model Description
|
24 |
|
25 |
- **Model Type:** SpanMarker
|
26 |
-
|
27 |
- **Maximum Sequence Length:** 256 tokens
|
28 |
- **Maximum Entity Length:** 8 words
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
|
33 |
### Model Sources
|
34 |
|
35 |
- **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
|
36 |
- **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Uses
|
39 |
|
40 |
### Direct Use for Inference
|
@@ -43,9 +144,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that ca
|
|
43 |
from span_marker import SpanMarkerModel
|
44 |
|
45 |
# Download from the 🤗 Hub
|
46 |
-
model = SpanMarkerModel.from_pretrained("
|
47 |
# Run inference
|
48 |
-
entities = model.predict("
|
49 |
```
|
50 |
|
51 |
### Downstream Use
|
@@ -57,7 +158,7 @@ You can finetune this model on your own dataset.
|
|
57 |
from span_marker import SpanMarkerModel, Trainer
|
58 |
|
59 |
# Download from the 🤗 Hub
|
60 |
-
model = SpanMarkerModel.from_pretrained("
|
61 |
|
62 |
# Specify a Dataset with "tokens" and "ner_tag" columns
|
63 |
dataset = load_dataset("conll2003") # For example CoNLL2003
|
@@ -69,7 +170,7 @@ trainer = Trainer(
|
|
69 |
eval_dataset=dataset["validation"],
|
70 |
)
|
71 |
trainer.train()
|
72 |
-
trainer.save_model("
|
73 |
```
|
74 |
</details>
|
75 |
|
@@ -93,6 +194,31 @@ trainer.save_model("span_marker_model_id-finetuned")
|
|
93 |
|
94 |
## Training Details
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
96 |
### Framework Versions
|
97 |
|
98 |
- Python: 3.9.16
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
license: cc-by-4.0
|
5 |
library_name: span-marker
|
6 |
tags:
|
7 |
- span-marker
|
|
|
9 |
- ner
|
10 |
- named-entity-recognition
|
11 |
- generated_from_span_marker_trainer
|
12 |
+
datasets:
|
13 |
+
- EMBO/SourceData
|
14 |
metrics:
|
15 |
- precision
|
16 |
- recall
|
17 |
- f1
|
18 |
+
widget:
|
19 |
+
- text: Comparison of ENCC-derived neurospheres treated with intestinal extract
|
20 |
+
from hypoganglionosis rats, hypoganglionosis treated with Fecal microbiota transplantation
|
21 |
+
(FMT) sham rat. Comparison of neuronal markers. (J) Immunofluorescence stain
|
22 |
+
number of PGP9.5+. Nuclei were stained blue with DAPI; Triangles indicate
|
23 |
+
PGP9.5+.
|
24 |
+
- text: 'Histochemical (H & E) immunostaining (red) show T (CD3+) neutrophil
|
25 |
+
(Ly6b+) infiltration in skin of mice in (A). Scale bar, 100 μm. (of CD3
|
26 |
+
Ly6b immunostaining from CsA treated mice represent seperate analyses performed
|
27 |
+
on serial thin sections.) of epidermal thickness, T (CD3+) neutrophil (Ly6b+)
|
28 |
+
infiltration (red) in skin thin sections from (C), (n = 6). Data
|
29 |
+
information: Data represent mean ± SD. * P < 0.05, * * P < 0.01 by two
|
30 |
+
-Mann-Whitney; two independent experiments.'
|
31 |
+
- text: 'C African green monkey kidney epithelial (Vero) were transfected with NC,
|
32 |
+
siMLKL, or miR-324-5p for 48 h. qPCR for expression of MLKL. Data information:
|
33 |
+
data are represented as means ± SD of three biological replicates. Statistical
|
34 |
+
analyses were performed using unpaired Student '' s t -. experiments were performed
|
35 |
+
at least three times, representative data are shown.'
|
36 |
+
- text: (F) Binding between FTCD p47 between p47 p97 is necessary for mitochondria
|
37 |
+
aggregation mediated by FTCDwt-HA-MAO. HeLa Tet-off inducibly expressing
|
38 |
+
FTCDwt-HA-MAO were transfected with mammalian expression constructs of
|
39 |
+
siRNA-insensitive Flag-tagged p47wt / mutants at same time as treatment of p47
|
40 |
+
siRNA, cultured for 24 hrs. were further cultured in DOX-free medium for 48 hrs
|
41 |
+
for induction of FTCD-HA-MAO. After fixation, were visualized with a monoclonal
|
42 |
+
antibody to mitochondria polyclonal antibodies to HA Flag. Panels a-l display
|
43 |
+
representative. Scale bar = 10 μm. (G) Binding between FTCD p97 is necessary
|
44 |
+
for mitochondria aggregation mediated by FTCDwt-HA-MAO. HeLa Tet-off inducibly
|
45 |
+
expressing FTCDwt-HA-MAO were transfected with mammalian expression construct
|
46 |
+
of siRNA-insensitive Flag-tagged p97wt / mutant at same time as treatment
|
47 |
+
with p97 siRNA. following procedures were same as in (F). Panels a-i display
|
48 |
+
representative. Scale bar = 10 μm. (H) results of of (F) (G). Results
|
49 |
+
are shown as mean ± SD of five sets of independent experiments, with 100 counted
|
50 |
+
in each group in each independent experiment. Asterisks indicate a significant
|
51 |
+
difference at P < 0.01 compared with siRNA treatment alone ('none') compared
|
52 |
+
with mutant expression (Bonferroni method).
|
53 |
+
- text: (b) Parkin is recruited selectively to depolarized mitochondria directs
|
54 |
+
mitophagy. HeLa transfected with HA-Parkin were treated with CCCP for indicated
|
55 |
+
times. Mitochondria were stained by anti-TOM20 (pseudo coloured; blue) a
|
56 |
+
ΔΨm dependent MitoTracker (red). Parkin was stained with anti-HA (green).
|
57 |
+
Without treatment, mitochondria are intact stained by both mitochondrial
|
58 |
+
markers, whereas Parkin is equally distributed in cytoplasm. After 2 h of CCCP
|
59 |
+
treatment, mitochondria are depolarized as shown by loss of MitoTracker. Parkin
|
60 |
+
completely translocates to mitochondria clustering at perinuclear regions. After
|
61 |
+
24h of CCCP treatment, massive loss of mitochondria is observed as shown by
|
62 |
+
disappearance of mitochondrial marker. Only Parkin-positive show mitochondrial
|
63 |
+
clustering clearance, in contrast to adjacent untransfected. Scale bars, 10
|
64 |
+
μm.
|
65 |
pipeline_tag: token-classification
|
66 |
+
base_model: bert-base-uncased
|
67 |
+
model-index:
|
68 |
+
- name: SpanMarker with bert-base-uncased on SourceData
|
69 |
+
results:
|
70 |
+
- task:
|
71 |
+
type: token-classification
|
72 |
+
name: Named Entity Recognition
|
73 |
+
dataset:
|
74 |
+
name: SourceData
|
75 |
+
type: EMBO/SourceData
|
76 |
+
split: test
|
77 |
+
metrics:
|
78 |
+
- type: f1
|
79 |
+
value: 0.8336481983993405
|
80 |
+
name: F1
|
81 |
+
- type: precision
|
82 |
+
value: 0.8345368269032392
|
83 |
+
name: Precision
|
84 |
+
- type: recall
|
85 |
+
value: 0.8327614603348888
|
86 |
+
name: Recall
|
87 |
---
|
88 |
|
89 |
+
# SpanMarker with bert-base-uncased on SourceData
|
90 |
|
91 |
+
This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [SourceData](https://huggingface.co/datasets/EMBO/SourceData) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-uncased](https://huggingface.co/models/bert-base-uncased) as the underlying encoder.
|
92 |
|
93 |
## Model Details
|
94 |
|
95 |
### Model Description
|
96 |
|
97 |
- **Model Type:** SpanMarker
|
98 |
+
- **Encoder:** [bert-base-uncased](https://huggingface.co/models/bert-base-uncased)
|
99 |
- **Maximum Sequence Length:** 256 tokens
|
100 |
- **Maximum Entity Length:** 8 words
|
101 |
+
- **Training Dataset:** [SourceData](https://huggingface.co/datasets/EMBO/SourceData)
|
102 |
+
- **Language:** en
|
103 |
+
- **License:** cc-by-4.0
|
104 |
|
105 |
### Model Sources
|
106 |
|
107 |
- **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
|
108 |
- **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
|
109 |
|
110 |
+
### Model Labels
|
111 |
+
| Label | Examples |
|
112 |
+
|:---------------|:--------------------------------------------------------|
|
113 |
+
| CELL_LINE | "293T", "WM266.4 451Lu", "501mel" |
|
114 |
+
| CELL_TYPE | "BMDMs", "protoplasts", "epithelial" |
|
115 |
+
| DISEASE | "melanoma", "lung metastasis", "breast prostate cancer" |
|
116 |
+
| EXP_ASSAY | "interactions", "Yeast two-hybrid", "BiFC" |
|
117 |
+
| GENEPROD | "CPL1", "FREE1 CPL1", "FREE1" |
|
118 |
+
| ORGANISM | "Arabidopsis", "yeast", "seedlings" |
|
119 |
+
| SMALL_MOLECULE | "polyacrylamide", "CHX", "SDS polyacrylamide" |
|
120 |
+
| SUBCELLULAR | "proteasome", "D-bodies", "plasma" |
|
121 |
+
| TISSUE | "Colon", "roots", "serum" |
|
122 |
+
|
123 |
+
## Evaluation
|
124 |
+
|
125 |
+
### Metrics
|
126 |
+
| Label | Precision | Recall | F1 |
|
127 |
+
|:---------------|:----------|:-------|:-------|
|
128 |
+
| **all** | 0.8345 | 0.8328 | 0.8336 |
|
129 |
+
| CELL_LINE | 0.9060 | 0.8866 | 0.8962 |
|
130 |
+
| CELL_TYPE | 0.7365 | 0.7746 | 0.7551 |
|
131 |
+
| DISEASE | 0.6204 | 0.6531 | 0.6363 |
|
132 |
+
| EXP_ASSAY | 0.7224 | 0.7096 | 0.7160 |
|
133 |
+
| GENEPROD | 0.8944 | 0.8960 | 0.8952 |
|
134 |
+
| ORGANISM | 0.8752 | 0.8902 | 0.8826 |
|
135 |
+
| SMALL_MOLECULE | 0.8304 | 0.8223 | 0.8263 |
|
136 |
+
| SUBCELLULAR | 0.7859 | 0.7699 | 0.7778 |
|
137 |
+
| TISSUE | 0.8134 | 0.8056 | 0.8094 |
|
138 |
+
|
139 |
## Uses
|
140 |
|
141 |
### Direct Use for Inference
|
|
|
144 |
from span_marker import SpanMarkerModel
|
145 |
|
146 |
# Download from the 🤗 Hub
|
147 |
+
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-uncased-sourcedata")
|
148 |
# Run inference
|
149 |
+
entities = model.predict("Comparison of ENCC-derived neurospheres treated with intestinal extract from hypoganglionosis rats, hypoganglionosis treated with Fecal microbiota transplantation (FMT) sham rat. Comparison of neuronal markers. (J) Immunofluorescence stain number of PGP9.5+. Nuclei were stained blue with DAPI; Triangles indicate PGP9.5+.")
|
150 |
```
|
151 |
|
152 |
### Downstream Use
|
|
|
158 |
from span_marker import SpanMarkerModel, Trainer
|
159 |
|
160 |
# Download from the 🤗 Hub
|
161 |
+
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-uncased-sourcedata")
|
162 |
|
163 |
# Specify a Dataset with "tokens" and "ner_tag" columns
|
164 |
dataset = load_dataset("conll2003") # For example CoNLL2003
|
|
|
170 |
eval_dataset=dataset["validation"],
|
171 |
)
|
172 |
trainer.train()
|
173 |
+
trainer.save_model("tomaarsen/span-marker-bert-base-uncased-sourcedata-finetuned")
|
174 |
```
|
175 |
</details>
|
176 |
|
|
|
194 |
|
195 |
## Training Details
|
196 |
|
197 |
+
### Training Set Metrics
|
198 |
+
| Training set | Min | Median | Max |
|
199 |
+
|:----------------------|:----|:--------|:-----|
|
200 |
+
| Sentence length | 4 | 71.0253 | 2609 |
|
201 |
+
| Entities per sentence | 0 | 8.3186 | 162 |
|
202 |
+
|
203 |
+
### Training Hyperparameters
|
204 |
+
- learning_rate: 5e-05
|
205 |
+
- train_batch_size: 32
|
206 |
+
- eval_batch_size: 32
|
207 |
+
- seed: 42
|
208 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
209 |
+
- lr_scheduler_type: linear
|
210 |
+
- lr_scheduler_warmup_ratio: 0.1
|
211 |
+
- num_epochs: 3
|
212 |
+
|
213 |
+
### Training Results
|
214 |
+
| Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|
215 |
+
|:------:|:-----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
|
216 |
+
| 0.5237 | 3000 | 0.0162 | 0.7972 | 0.8162 | 0.8065 | 0.9520 |
|
217 |
+
| 1.0473 | 6000 | 0.0155 | 0.8188 | 0.8251 | 0.8219 | 0.9560 |
|
218 |
+
| 1.5710 | 9000 | 0.0155 | 0.8213 | 0.8324 | 0.8268 | 0.9563 |
|
219 |
+
| 2.0946 | 12000 | 0.0163 | 0.8315 | 0.8347 | 0.8331 | 0.9581 |
|
220 |
+
| 2.6183 | 15000 | 0.0167 | 0.8303 | 0.8378 | 0.8340 | 0.9582 |
|
221 |
+
|
222 |
### Framework Versions
|
223 |
|
224 |
- Python: 3.9.16
|