wbi-sg commited on
Commit
19cd30a
1 Parent(s): 127c410

First model version

Browse files
Files changed (4) hide show
  1. README.md +94 -3
  2. loss.tsv +101 -0
  3. pytorch_model.bin +3 -0
  4. training.log +0 -0
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - hunflair
5
+ - token-classification
6
+ - sequence-tagger-model
7
+ language: en
8
+ widget:
9
+ - text: "It contains a functional GCGGCGGCG Egr-1-binding site"
10
+ ---
11
+
12
+ ## HunFlair model for Transcription Factor Binding Site (TFBS)
13
+
14
+ [HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md) (biomedical flair) for TFBS entity.
15
+
16
+
17
+ Predicts 1 tag:
18
+
19
+ | **tag** | **meaning** |
20
+ |---------------------------------|-----------|
21
+ | Tfbs | DNA region bound by transcription factor |
22
+
23
+ ---
24
+
25
+ ### Demo: How to use in Flair
26
+
27
+ Requires:
28
+ - **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
29
+
30
+ ```python
31
+ from flair.data import Sentence
32
+ from flair.models import SequenceTagger
33
+ # for biomedical-specific tokenization:
34
+ # from flair.tokenization import SciSpacyTokenizer
35
+
36
+ # load tagger
37
+ tagger = SequenceTagger.load("regel-corpus/hunflair-tfbs")
38
+
39
+ text = "We found that Egr-1 specifically binds to the PTEN 5' untranslated region, which contains a functional GCGGCGGCG Egr-1-binding site."
40
+
41
+ # make example sentence
42
+ sentence = Sentence(text)
43
+
44
+ # for biomedical-specific tokenization:
45
+ # sentence = Sentence(text, use_tokenizer=SciSpacyTokenizer())
46
+
47
+ # predict NER tags
48
+ tagger.predict(sentence)
49
+
50
+ # print sentence
51
+ print(sentence)
52
+
53
+ # print predicted NER spans
54
+ print('The following NER tags are found:')
55
+ # iterate over entities and print
56
+ for entity in sentence.get_spans('ner'):
57
+ print(entity)
58
+
59
+ ```
60
+
61
+ This yields the following output:
62
+ ```
63
+ Span [19,20,21]: "GCGGCGGCG Egr-1-binding site" [− Labels: Tfbs (0.9631)]
64
+ ```
65
+
66
+ So, the entity "*GCGGCGGCG Egr-1-binding site*" is found in the sentence.
67
+
68
+ Alternatively download all models locally and use the `MultiTagger` class.
69
+
70
+ ```python
71
+ from flair.models import MultiTagger
72
+
73
+ tagger = [
74
+ './models/hunflair-promoter/pytorch_model.bin',
75
+ './models/hunflair-enhancer/pytorch_model.bin',
76
+ './models/hunflair-tfbs/pytorch_model.bin',
77
+ ]
78
+
79
+ tagger = MultiTagger.load(['./models/hunflair-'])
80
+
81
+ tagger.predict(sentence)
82
+ ```
83
+
84
+ ---
85
+
86
+ ### Cite
87
+
88
+ Please cite the following paper when using this model.
89
+
90
+ ```
91
+ TODO
92
+ ```
93
+
94
+
loss.tsv ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS
2
+ 1 11:10:11 0 0.1000 0.14710011079536267
3
+ 2 11:10:38 0 0.1000 0.052713516247065106
4
+ 3 11:11:08 0 0.1000 0.04170313093876623
5
+ 4 11:11:37 0 0.1000 0.03502531913431957
6
+ 5 11:12:05 0 0.1000 0.033235160847185675
7
+ 6 11:12:36 0 0.1000 0.02798608590571261
8
+ 7 11:13:05 1 0.1000 0.029335829712453517
9
+ 8 11:13:35 0 0.1000 0.027881322689149987
10
+ 9 11:14:04 0 0.1000 0.018479608349992186
11
+ 10 11:14:32 1 0.1000 0.02018662912365272
12
+ 11 11:15:00 0 0.1000 0.017519669068933433
13
+ 12 11:15:29 0 0.1000 0.017486945031318618
14
+ 13 11:15:59 0 0.1000 0.016083299853363285
15
+ 14 11:16:29 0 0.1000 0.013231104631704026
16
+ 15 11:17:00 1 0.1000 0.014683758352898918
17
+ 16 11:17:30 0 0.1000 0.012625955455249794
18
+ 17 11:18:01 0 0.1000 0.012044876115490816
19
+ 18 11:18:29 0 0.1000 0.01028024372570282
20
+ 19 11:18:58 1 0.1000 0.010979479036436342
21
+ 20 11:19:28 2 0.1000 0.011138790193971255
22
+ 21 11:19:57 3 0.1000 0.010698116323899108
23
+ 22 11:20:26 0 0.1000 0.009946761713055951
24
+ 23 11:20:57 0 0.1000 0.009936871515714868
25
+ 24 11:21:29 0 0.1000 0.009308915260079325
26
+ 25 11:21:59 0 0.1000 0.008796980498815181
27
+ 26 11:22:28 0 0.1000 0.008343673869569414
28
+ 27 11:22:57 0 0.1000 0.0073019944492553265
29
+ 28 11:23:25 0 0.1000 0.007136161202930401
30
+ 29 11:23:53 1 0.1000 0.007951276040687372
31
+ 30 11:24:23 0 0.1000 0.007115239192906809
32
+ 31 11:24:53 0 0.1000 0.004631937972905484
33
+ 32 11:25:23 1 0.1000 0.005918597149124509
34
+ 33 11:25:52 2 0.1000 0.006554636867261261
35
+ 34 11:26:23 3 0.1000 0.006275285747952336
36
+ 35 11:26:53 4 0.1000 0.007606807318433906
37
+ 36 11:27:23 0 0.0500 0.003300808944614297
38
+ 37 11:27:53 1 0.0500 0.0035301452838223347
39
+ 38 11:28:24 2 0.0500 0.00373693162841379
40
+ 39 11:28:54 0 0.0500 0.0024590705298489033
41
+ 40 11:29:24 1 0.0500 0.002555141605089226
42
+ 41 11:29:54 2 0.0500 0.003140753947428485
43
+ 42 11:30:24 0 0.0500 0.0022883215824823166
44
+ 43 11:30:53 1 0.0500 0.0027966012533954332
45
+ 44 11:31:22 0 0.0500 0.0020993295440013752
46
+ 45 11:31:51 1 0.0500 0.002740449417759927
47
+ 46 11:32:20 0 0.0500 0.002064300411742808
48
+ 47 11:32:51 1 0.0500 0.0021202283643759357
49
+ 48 11:33:22 0 0.0500 0.0019746542345450225
50
+ 49 11:33:52 1 0.0500 0.0024491152056052724
51
+ 50 11:34:23 2 0.0500 0.0021125159065796405
52
+ 51 11:34:53 0 0.0500 0.0014394055756139811
53
+ 52 11:35:23 1 0.0500 0.0015080880726509233
54
+ 53 11:35:53 2 0.0500 0.0022499665559356547
55
+ 54 11:36:24 3 0.0500 0.0018459306494895864
56
+ 55 11:36:54 4 0.0500 0.0016427611877301478
57
+ 56 11:37:26 1 0.0250 0.0015329277878862626
58
+ 57 11:37:55 0 0.0250 0.001336870801220413
59
+ 58 11:38:25 0 0.0250 0.0008539672446812107
60
+ 59 11:38:56 1 0.0250 0.0010773914105697013
61
+ 60 11:39:25 2 0.0250 0.0011901499907322449
62
+ 61 11:39:55 3 0.0250 0.0008571102830465054
63
+ 62 11:40:25 4 0.0250 0.0010969846457892516
64
+ 63 11:40:54 0 0.0125 0.0007889307362559138
65
+ 64 11:41:22 0 0.0125 0.0007578928750069597
66
+ 65 11:41:51 1 0.0125 0.0013097398813870695
67
+ 66 11:42:20 0 0.0125 0.0006851142786664053
68
+ 67 11:42:48 1 0.0125 0.0008955751240574359
69
+ 68 11:43:16 2 0.0125 0.0010339341863628265
70
+ 69 11:43:46 3 0.0125 0.0009957815745854383
71
+ 70 11:44:14 4 0.0125 0.0008350513470559088
72
+ 71 11:44:44 1 0.0063 0.0007752945301019462
73
+ 72 11:45:12 2 0.0063 0.0008474854508743485
74
+ 73 11:45:42 0 0.0063 0.0005586575015901546
75
+ 74 11:46:13 0 0.0063 0.0005568093736224873
76
+ 75 11:46:42 1 0.0063 0.0007954383216220868
77
+ 76 11:47:11 2 0.0063 0.0007537361897772951
78
+ 77 11:47:40 3 0.0063 0.0006539163257362983
79
+ 78 11:48:09 4 0.0063 0.0007293145411753851
80
+ 79 11:48:37 1 0.0031 0.0007481573379829859
81
+ 80 11:49:06 2 0.0031 0.0007069956713799059
82
+ 81 11:49:35 3 0.0031 0.0006705211829076631
83
+ 82 11:50:04 4 0.0031 0.0013279289754432272
84
+ 83 11:50:33 1 0.0016 0.0011473284043027811
85
+ 84 11:51:02 0 0.0016 0.0005220935390625203
86
+ 85 11:51:30 1 0.0016 0.0010433767265242975
87
+ 86 11:52:00 2 0.0016 0.0010116299189231366
88
+ 87 11:52:29 3 0.0016 0.000820558251712016
89
+ 88 11:52:58 4 0.0016 0.000658769017019137
90
+ 89 11:53:27 1 0.0008 0.0006596607302553888
91
+ 90 11:53:56 2 0.0008 0.0009832041381581406
92
+ 91 11:54:26 3 0.0008 0.0007839900746833205
93
+ 92 11:54:56 4 0.0008 0.000751059051643154
94
+ 93 11:55:25 1 0.0004 0.0008921829651493708
95
+ 94 11:55:55 2 0.0004 0.0007218841383679459
96
+ 95 11:56:26 0 0.0004 0.000456431968685245
97
+ 96 11:56:56 1 0.0004 0.0009873492483496796
98
+ 97 11:57:25 2 0.0004 0.0006661835511418295
99
+ 98 11:57:54 3 0.0004 0.0009475974411766899
100
+ 99 11:58:24 4 0.0004 0.0009786815036461967
101
+ 100 11:58:53 1 0.0002 0.0007292146594788754
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a335f97fa927c240e7fe5eaa035e383a28ece7c9f3d474e283d41ba1e52d6f6
3
+ size 1104819835
training.log ADDED
The diff for this file is too large to render. See raw diff