First model version
Browse files- README.md +94 -3
- loss.tsv +101 -0
- pytorch_model.bin +3 -0
- training.log +0 -0
README.md
CHANGED
@@ -1,3 +1,94 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- flair
|
4 |
+
- hunflair
|
5 |
+
- token-classification
|
6 |
+
- sequence-tagger-model
|
7 |
+
language: en
|
8 |
+
widget:
|
9 |
+
- text: "It contains a functional GCGGCGGCG Egr-1-binding site"
|
10 |
+
---
|
11 |
+
|
12 |
+
## HunFlair model for Transcription Factor Binding Site (TFBS)
|
13 |
+
|
14 |
+
[HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md) (biomedical flair) for TFBS entity.
|
15 |
+
|
16 |
+
|
17 |
+
Predicts 1 tag:
|
18 |
+
|
19 |
+
| **tag** | **meaning** |
|
20 |
+
|---------------------------------|-----------|
|
21 |
+
| Tfbs | DNA region bound by transcription factor |
|
22 |
+
|
23 |
+
---
|
24 |
+
|
25 |
+
### Demo: How to use in Flair
|
26 |
+
|
27 |
+
Requires:
|
28 |
+
- **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
|
29 |
+
|
30 |
+
```python
|
31 |
+
from flair.data import Sentence
|
32 |
+
from flair.models import SequenceTagger
|
33 |
+
# for biomedical-specific tokenization:
|
34 |
+
# from flair.tokenization import SciSpacyTokenizer
|
35 |
+
|
36 |
+
# load tagger
|
37 |
+
tagger = SequenceTagger.load("regel-corpus/hunflair-tfbs")
|
38 |
+
|
39 |
+
text = "We found that Egr-1 specifically binds to the PTEN 5' untranslated region, which contains a functional GCGGCGGCG Egr-1-binding site."
|
40 |
+
|
41 |
+
# make example sentence
|
42 |
+
sentence = Sentence(text)
|
43 |
+
|
44 |
+
# for biomedical-specific tokenization:
|
45 |
+
# sentence = Sentence(text, use_tokenizer=SciSpacyTokenizer())
|
46 |
+
|
47 |
+
# predict NER tags
|
48 |
+
tagger.predict(sentence)
|
49 |
+
|
50 |
+
# print sentence
|
51 |
+
print(sentence)
|
52 |
+
|
53 |
+
# print predicted NER spans
|
54 |
+
print('The following NER tags are found:')
|
55 |
+
# iterate over entities and print
|
56 |
+
for entity in sentence.get_spans('ner'):
|
57 |
+
print(entity)
|
58 |
+
|
59 |
+
```
|
60 |
+
|
61 |
+
This yields the following output:
|
62 |
+
```
|
63 |
+
Span [19,20,21]: "GCGGCGGCG Egr-1-binding site" [− Labels: Tfbs (0.9631)]
|
64 |
+
```
|
65 |
+
|
66 |
+
So, the entity "*GCGGCGGCG Egr-1-binding site*" is found in the sentence.
|
67 |
+
|
68 |
+
Alternatively download all models locally and use the `MultiTagger` class.
|
69 |
+
|
70 |
+
```python
|
71 |
+
from flair.models import MultiTagger
|
72 |
+
|
73 |
+
tagger = [
|
74 |
+
'./models/hunflair-promoter/pytorch_model.bin',
|
75 |
+
'./models/hunflair-enhancer/pytorch_model.bin',
|
76 |
+
'./models/hunflair-tfbs/pytorch_model.bin',
|
77 |
+
]
|
78 |
+
|
79 |
+
tagger = MultiTagger.load(['./models/hunflair-'])
|
80 |
+
|
81 |
+
tagger.predict(sentence)
|
82 |
+
```
|
83 |
+
|
84 |
+
---
|
85 |
+
|
86 |
+
### Cite
|
87 |
+
|
88 |
+
Please cite the following paper when using this model.
|
89 |
+
|
90 |
+
```
|
91 |
+
TODO
|
92 |
+
```
|
93 |
+
|
94 |
+
|
loss.tsv
ADDED
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS
|
2 |
+
1 11:10:11 0 0.1000 0.14710011079536267
|
3 |
+
2 11:10:38 0 0.1000 0.052713516247065106
|
4 |
+
3 11:11:08 0 0.1000 0.04170313093876623
|
5 |
+
4 11:11:37 0 0.1000 0.03502531913431957
|
6 |
+
5 11:12:05 0 0.1000 0.033235160847185675
|
7 |
+
6 11:12:36 0 0.1000 0.02798608590571261
|
8 |
+
7 11:13:05 1 0.1000 0.029335829712453517
|
9 |
+
8 11:13:35 0 0.1000 0.027881322689149987
|
10 |
+
9 11:14:04 0 0.1000 0.018479608349992186
|
11 |
+
10 11:14:32 1 0.1000 0.02018662912365272
|
12 |
+
11 11:15:00 0 0.1000 0.017519669068933433
|
13 |
+
12 11:15:29 0 0.1000 0.017486945031318618
|
14 |
+
13 11:15:59 0 0.1000 0.016083299853363285
|
15 |
+
14 11:16:29 0 0.1000 0.013231104631704026
|
16 |
+
15 11:17:00 1 0.1000 0.014683758352898918
|
17 |
+
16 11:17:30 0 0.1000 0.012625955455249794
|
18 |
+
17 11:18:01 0 0.1000 0.012044876115490816
|
19 |
+
18 11:18:29 0 0.1000 0.01028024372570282
|
20 |
+
19 11:18:58 1 0.1000 0.010979479036436342
|
21 |
+
20 11:19:28 2 0.1000 0.011138790193971255
|
22 |
+
21 11:19:57 3 0.1000 0.010698116323899108
|
23 |
+
22 11:20:26 0 0.1000 0.009946761713055951
|
24 |
+
23 11:20:57 0 0.1000 0.009936871515714868
|
25 |
+
24 11:21:29 0 0.1000 0.009308915260079325
|
26 |
+
25 11:21:59 0 0.1000 0.008796980498815181
|
27 |
+
26 11:22:28 0 0.1000 0.008343673869569414
|
28 |
+
27 11:22:57 0 0.1000 0.0073019944492553265
|
29 |
+
28 11:23:25 0 0.1000 0.007136161202930401
|
30 |
+
29 11:23:53 1 0.1000 0.007951276040687372
|
31 |
+
30 11:24:23 0 0.1000 0.007115239192906809
|
32 |
+
31 11:24:53 0 0.1000 0.004631937972905484
|
33 |
+
32 11:25:23 1 0.1000 0.005918597149124509
|
34 |
+
33 11:25:52 2 0.1000 0.006554636867261261
|
35 |
+
34 11:26:23 3 0.1000 0.006275285747952336
|
36 |
+
35 11:26:53 4 0.1000 0.007606807318433906
|
37 |
+
36 11:27:23 0 0.0500 0.003300808944614297
|
38 |
+
37 11:27:53 1 0.0500 0.0035301452838223347
|
39 |
+
38 11:28:24 2 0.0500 0.00373693162841379
|
40 |
+
39 11:28:54 0 0.0500 0.0024590705298489033
|
41 |
+
40 11:29:24 1 0.0500 0.002555141605089226
|
42 |
+
41 11:29:54 2 0.0500 0.003140753947428485
|
43 |
+
42 11:30:24 0 0.0500 0.0022883215824823166
|
44 |
+
43 11:30:53 1 0.0500 0.0027966012533954332
|
45 |
+
44 11:31:22 0 0.0500 0.0020993295440013752
|
46 |
+
45 11:31:51 1 0.0500 0.002740449417759927
|
47 |
+
46 11:32:20 0 0.0500 0.002064300411742808
|
48 |
+
47 11:32:51 1 0.0500 0.0021202283643759357
|
49 |
+
48 11:33:22 0 0.0500 0.0019746542345450225
|
50 |
+
49 11:33:52 1 0.0500 0.0024491152056052724
|
51 |
+
50 11:34:23 2 0.0500 0.0021125159065796405
|
52 |
+
51 11:34:53 0 0.0500 0.0014394055756139811
|
53 |
+
52 11:35:23 1 0.0500 0.0015080880726509233
|
54 |
+
53 11:35:53 2 0.0500 0.0022499665559356547
|
55 |
+
54 11:36:24 3 0.0500 0.0018459306494895864
|
56 |
+
55 11:36:54 4 0.0500 0.0016427611877301478
|
57 |
+
56 11:37:26 1 0.0250 0.0015329277878862626
|
58 |
+
57 11:37:55 0 0.0250 0.001336870801220413
|
59 |
+
58 11:38:25 0 0.0250 0.0008539672446812107
|
60 |
+
59 11:38:56 1 0.0250 0.0010773914105697013
|
61 |
+
60 11:39:25 2 0.0250 0.0011901499907322449
|
62 |
+
61 11:39:55 3 0.0250 0.0008571102830465054
|
63 |
+
62 11:40:25 4 0.0250 0.0010969846457892516
|
64 |
+
63 11:40:54 0 0.0125 0.0007889307362559138
|
65 |
+
64 11:41:22 0 0.0125 0.0007578928750069597
|
66 |
+
65 11:41:51 1 0.0125 0.0013097398813870695
|
67 |
+
66 11:42:20 0 0.0125 0.0006851142786664053
|
68 |
+
67 11:42:48 1 0.0125 0.0008955751240574359
|
69 |
+
68 11:43:16 2 0.0125 0.0010339341863628265
|
70 |
+
69 11:43:46 3 0.0125 0.0009957815745854383
|
71 |
+
70 11:44:14 4 0.0125 0.0008350513470559088
|
72 |
+
71 11:44:44 1 0.0063 0.0007752945301019462
|
73 |
+
72 11:45:12 2 0.0063 0.0008474854508743485
|
74 |
+
73 11:45:42 0 0.0063 0.0005586575015901546
|
75 |
+
74 11:46:13 0 0.0063 0.0005568093736224873
|
76 |
+
75 11:46:42 1 0.0063 0.0007954383216220868
|
77 |
+
76 11:47:11 2 0.0063 0.0007537361897772951
|
78 |
+
77 11:47:40 3 0.0063 0.0006539163257362983
|
79 |
+
78 11:48:09 4 0.0063 0.0007293145411753851
|
80 |
+
79 11:48:37 1 0.0031 0.0007481573379829859
|
81 |
+
80 11:49:06 2 0.0031 0.0007069956713799059
|
82 |
+
81 11:49:35 3 0.0031 0.0006705211829076631
|
83 |
+
82 11:50:04 4 0.0031 0.0013279289754432272
|
84 |
+
83 11:50:33 1 0.0016 0.0011473284043027811
|
85 |
+
84 11:51:02 0 0.0016 0.0005220935390625203
|
86 |
+
85 11:51:30 1 0.0016 0.0010433767265242975
|
87 |
+
86 11:52:00 2 0.0016 0.0010116299189231366
|
88 |
+
87 11:52:29 3 0.0016 0.000820558251712016
|
89 |
+
88 11:52:58 4 0.0016 0.000658769017019137
|
90 |
+
89 11:53:27 1 0.0008 0.0006596607302553888
|
91 |
+
90 11:53:56 2 0.0008 0.0009832041381581406
|
92 |
+
91 11:54:26 3 0.0008 0.0007839900746833205
|
93 |
+
92 11:54:56 4 0.0008 0.000751059051643154
|
94 |
+
93 11:55:25 1 0.0004 0.0008921829651493708
|
95 |
+
94 11:55:55 2 0.0004 0.0007218841383679459
|
96 |
+
95 11:56:26 0 0.0004 0.000456431968685245
|
97 |
+
96 11:56:56 1 0.0004 0.0009873492483496796
|
98 |
+
97 11:57:25 2 0.0004 0.0006661835511418295
|
99 |
+
98 11:57:54 3 0.0004 0.0009475974411766899
|
100 |
+
99 11:58:24 4 0.0004 0.0009786815036461967
|
101 |
+
100 11:58:53 1 0.0002 0.0007292146594788754
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9a335f97fa927c240e7fe5eaa035e383a28ece7c9f3d474e283d41ba1e52d6f6
|
3 |
+
size 1104819835
|
training.log
ADDED
The diff for this file is too large to render.
See raw diff
|
|