File size: 2,428 Bytes
c70bdf4
 
 
 
 
 
 
 
b0ef69d
c70bdf4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef6f429
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c70bdf4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
tags:
- flair
- hunflair
- token-classification
- sequence-tagger-model
language: en
widget:
- text: "Two putative extended promoters consensus sequences (p1 and p2)."
---

## HunFlair model for PROMOTER  

[HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR.md) (biomedical flair) for promoter entity.


Predicts 1 tag:

| **tag**                        | **meaning** |
|---------------------------------|-----------|
| Promoter         | DNA promoter region | 

---

### Cite

Please cite the following paper when using this model.

```
@article{garda2022regel,
  title={RegEl corpus: identifying DNA regulatory elements in the scientific literature},
  author={Garda, Samuele and Lenihan-Geels, Freyda and Proft, Sebastian and Hochmuth, Stefanie and Sch{\"u}lke, Markus and Seelow, Dominik and Leser, Ulf},
  journal={Database},
  volume={2022},
  year={2022},
  publisher={Oxford Academic}
}
```
---

### Demo: How to use in Flair

Requires: 
- **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)

```python
from flair.data import Sentence
from flair.models import SequenceTagger
# for biomedical-specific tokenization:
# from flair.tokenization import SciSpacyTokenizer

# load tagger
tagger = SequenceTagger.load("regel-corpus/hunflair-promoter")

text = "The upstream region of the glnA gene contained two putative extended promoter consensus sequences (p1 and p2)."

# make example sentence
sentence = Sentence(text)

# for biomedical-specific tokenization:
# sentence = Sentence(text, use_tokenizer=SciSpacyTokenizer())

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

```

This yields the following output:

```
Span [16]: "p1"   [− Labels: Promoter (0.9878)]
Span [18]: "p2"   [− Labels: Promoter (0.9216)]
```

So, the entities "*p1*" and "*p2*" (labeled as a **promoter**)  are found in the sentence.

Alternatively download all models locally and use the `MultiTagger` class.

```python
from flair.models import MultiTagger

tagger = [
'./models/hunflair-promoter/pytorch_model.bin',
'./models/hunflair-enhancer/pytorch_model.bin',
'./models/hunflair-tfbs/pytorch_model.bin',
]

tagger = MultiTagger.load(['./models/hunflair-'])

tagger.predict(sentence)
```