initial model commit
Browse files- README.md +164 -0
- loss.tsv +151 -0
- pytorch_model.bin +3 -0
README.md
ADDED
@@ -0,0 +1,164 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- flair
|
4 |
+
- token-classification
|
5 |
+
- sequence-tagger-model
|
6 |
+
language: en
|
7 |
+
datasets:
|
8 |
+
- ontonotes
|
9 |
+
inference: false
|
10 |
+
---
|
11 |
+
|
12 |
+
## English NER in Flair (Ontonotes fast model)
|
13 |
+
|
14 |
+
This is the fast version of the 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).
|
15 |
+
|
16 |
+
F1-Score: **89.3** (Ontonotes)
|
17 |
+
|
18 |
+
Predicts 18 tags:
|
19 |
+
|
20 |
+
| **tag** | **meaning** |
|
21 |
+
|---------------------------------|-----------|
|
22 |
+
| CARDINAL | cardinal value |
|
23 |
+
| DATE | date value |
|
24 |
+
| EVENT | event name |
|
25 |
+
| FAC | building name |
|
26 |
+
| GPE | geo-political entity |
|
27 |
+
| LANGUAGE | language name |
|
28 |
+
| LAW | law name |
|
29 |
+
| LOC | location name |
|
30 |
+
| MONEY | money name |
|
31 |
+
| NORP | affiliation |
|
32 |
+
| ORDINAL | ordinal value |
|
33 |
+
| ORG | organization name |
|
34 |
+
| PERCENT | percent value |
|
35 |
+
| PERSON | person name |
|
36 |
+
| PRODUCT | product name |
|
37 |
+
| QUANTITY | quantity value |
|
38 |
+
| TIME | time value |
|
39 |
+
| WORK_OF_ART | name of work of art |
|
40 |
+
|
41 |
+
Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
|
42 |
+
|
43 |
+
---
|
44 |
+
|
45 |
+
### Demo: How to use in Flair
|
46 |
+
|
47 |
+
Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
|
48 |
+
|
49 |
+
```python
|
50 |
+
from flair.data import Sentence
|
51 |
+
from flair.models import SequenceTagger
|
52 |
+
|
53 |
+
# load tagger
|
54 |
+
tagger = SequenceTagger.load("flair/ner-english-ontonotes-fast")
|
55 |
+
|
56 |
+
# make example sentence
|
57 |
+
sentence = Sentence("On September 1st George Washington won 1 dollar.")
|
58 |
+
|
59 |
+
# predict NER tags
|
60 |
+
tagger.predict(sentence)
|
61 |
+
|
62 |
+
# print sentence
|
63 |
+
print(sentence)
|
64 |
+
|
65 |
+
# print predicted NER spans
|
66 |
+
print('The following NER tags are found:')
|
67 |
+
# iterate over entities and print
|
68 |
+
for entity in sentence.get_spans('ner'):
|
69 |
+
print(entity)
|
70 |
+
|
71 |
+
```
|
72 |
+
|
73 |
+
This yields the following output:
|
74 |
+
```
|
75 |
+
Span [2,3]: "September 1st" [− Labels: DATE (0.8824)]
|
76 |
+
Span [4,5]: "George Washington" [− Labels: PERSON (0.9604)]
|
77 |
+
Span [7,8]: "1 dollar" [− Labels: MONEY (0.9837)]
|
78 |
+
```
|
79 |
+
|
80 |
+
So, the entities "*September 1st*" (labeled as a **date**), "*George Washington*" (labeled as a **person**) and "*1 dollar*" (labeled as a **money**) are found in the sentence "*On September 1st George Washington won 1 dollar*".
|
81 |
+
|
82 |
+
|
83 |
+
---
|
84 |
+
|
85 |
+
### Training: Script to train this model
|
86 |
+
|
87 |
+
The following Flair script was used to train this model:
|
88 |
+
|
89 |
+
```python
|
90 |
+
from flair.data import Corpus
|
91 |
+
from flair.datasets import ColumnCorpus
|
92 |
+
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
|
93 |
+
|
94 |
+
# 1. load the corpus (Ontonotes does not ship with Flair, you need to download and reformat into a column format yourself)
|
95 |
+
corpus: Corpus = ColumnCorpus(
|
96 |
+
"resources/tasks/onto-ner",
|
97 |
+
column_format={0: "text", 1: "pos", 2: "upos", 3: "ner"},
|
98 |
+
tag_to_bioes="ner",
|
99 |
+
)
|
100 |
+
|
101 |
+
# 2. what tag do we want to predict?
|
102 |
+
tag_type = 'ner'
|
103 |
+
|
104 |
+
# 3. make the tag dictionary from the corpus
|
105 |
+
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
|
106 |
+
|
107 |
+
# 4. initialize each embedding we use
|
108 |
+
embedding_types = [
|
109 |
+
|
110 |
+
# GloVe embeddings
|
111 |
+
WordEmbeddings('en-crawl'),
|
112 |
+
|
113 |
+
# contextual string embeddings, forward
|
114 |
+
FlairEmbeddings('news-forward-fast'),
|
115 |
+
|
116 |
+
# contextual string embeddings, backward
|
117 |
+
FlairEmbeddings('news-backward-fast'),
|
118 |
+
]
|
119 |
+
|
120 |
+
# embedding stack consists of Flair and GloVe embeddings
|
121 |
+
embeddings = StackedEmbeddings(embeddings=embedding_types)
|
122 |
+
|
123 |
+
# 5. initialize sequence tagger
|
124 |
+
from flair.models import SequenceTagger
|
125 |
+
|
126 |
+
tagger = SequenceTagger(hidden_size=256,
|
127 |
+
embeddings=embeddings,
|
128 |
+
tag_dictionary=tag_dictionary,
|
129 |
+
tag_type=tag_type)
|
130 |
+
|
131 |
+
# 6. initialize trainer
|
132 |
+
from flair.trainers import ModelTrainer
|
133 |
+
|
134 |
+
trainer = ModelTrainer(tagger, corpus)
|
135 |
+
|
136 |
+
# 7. run training
|
137 |
+
trainer.train('resources/taggers/ner-english-ontonotes-fast',
|
138 |
+
train_with_dev=True,
|
139 |
+
max_epochs=150)
|
140 |
+
```
|
141 |
+
|
142 |
+
|
143 |
+
|
144 |
+
---
|
145 |
+
|
146 |
+
### Cite
|
147 |
+
|
148 |
+
Please cite the following paper when using this model.
|
149 |
+
|
150 |
+
```
|
151 |
+
@inproceedings{akbik2018coling,
|
152 |
+
title={Contextual String Embeddings for Sequence Labeling},
|
153 |
+
author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
|
154 |
+
booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
|
155 |
+
pages = {1638--1649},
|
156 |
+
year = {2018}
|
157 |
+
}
|
158 |
+
```
|
159 |
+
|
160 |
+
---
|
161 |
+
|
162 |
+
### Issues?
|
163 |
+
|
164 |
+
The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
|
loss.tsv
ADDED
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS
|
2 |
+
0 23:09:41 0 0.1000 3.2296528358391994
|
3 |
+
1 23:22:27 0 0.1000 1.5920132506231093
|
4 |
+
2 23:35:15 0 0.1000 1.3207480492007058
|
5 |
+
3 23:48:00 0 0.1000 1.1772499416796667
|
6 |
+
4 00:00:46 0 0.1000 1.075935570046587
|
7 |
+
5 00:13:32 0 0.1000 1.0152623265981675
|
8 |
+
6 00:26:17 0 0.1000 0.9606320605637892
|
9 |
+
7 00:39:04 0 0.1000 0.9167048768839746
|
10 |
+
8 00:51:51 0 0.1000 0.8835731125327776
|
11 |
+
9 01:04:36 0 0.1000 0.8626177742233816
|
12 |
+
10 01:17:23 0 0.1000 0.8316960627290437
|
13 |
+
11 01:30:11 0 0.1000 0.8111448318440959
|
14 |
+
12 01:42:58 0 0.1000 0.7882771724912355
|
15 |
+
13 01:55:47 0 0.1000 0.7772803506704996
|
16 |
+
14 02:08:34 0 0.1000 0.7563603024718897
|
17 |
+
15 02:21:27 0 0.1000 0.7446274626255035
|
18 |
+
16 02:34:27 0 0.1000 0.7325120459637552
|
19 |
+
17 02:47:28 0 0.1000 0.7164706003328539
|
20 |
+
18 03:00:18 0 0.1000 0.6953795908475822
|
21 |
+
19 03:13:06 1 0.1000 0.6954403392886216
|
22 |
+
20 03:26:06 0 0.1000 0.6850152359368666
|
23 |
+
21 03:38:58 0 0.1000 0.6716467778918878
|
24 |
+
22 03:52:00 0 0.1000 0.6623472330142867
|
25 |
+
23 04:04:49 0 0.1000 0.6543980406590227
|
26 |
+
24 04:17:42 0 0.1000 0.6484055938135903
|
27 |
+
25 04:30:37 0 0.1000 0.6380644176703579
|
28 |
+
26 04:43:42 0 0.1000 0.637061537284896
|
29 |
+
27 04:56:49 0 0.1000 0.6342280050493636
|
30 |
+
28 05:09:43 0 0.1000 0.6191383106472357
|
31 |
+
29 05:22:41 0 0.1000 0.613722907181056
|
32 |
+
30 05:35:39 0 0.1000 0.6094017116303714
|
33 |
+
31 05:48:35 0 0.1000 0.600858421494376
|
34 |
+
32 06:01:36 1 0.1000 0.6034416157400833
|
35 |
+
33 06:14:33 0 0.1000 0.5933149380335268
|
36 |
+
34 06:27:31 0 0.1000 0.5902228662202943
|
37 |
+
35 06:40:30 0 0.1000 0.5814154819609983
|
38 |
+
36 06:53:30 1 0.1000 0.5834101356706529
|
39 |
+
37 07:06:23 2 0.1000 0.581889728307724
|
40 |
+
38 07:19:15 0 0.1000 0.5660430806537844
|
41 |
+
39 07:32:06 1 0.1000 0.5683663231921646
|
42 |
+
40 07:45:02 0 0.1000 0.558734485265219
|
43 |
+
41 07:57:57 0 0.1000 0.5557521927581643
|
44 |
+
42 08:10:54 0 0.1000 0.5528975577568108
|
45 |
+
43 08:24:03 0 0.1000 0.5484419691956268
|
46 |
+
44 08:36:59 1 0.1000 0.5568543178405402
|
47 |
+
45 08:49:58 0 0.1000 0.5446653557610962
|
48 |
+
46 09:03:11 0 0.1000 0.5384193196161738
|
49 |
+
47 09:16:10 0 0.1000 0.5350716501699304
|
50 |
+
48 09:29:27 0 0.1000 0.5284495891883688
|
51 |
+
49 09:42:25 0 0.1000 0.5265147034627087
|
52 |
+
50 09:55:24 0 0.1000 0.5207880691256164
|
53 |
+
51 10:08:23 1 0.1000 0.5229102363901318
|
54 |
+
52 10:21:34 2 0.1000 0.5247485248997527
|
55 |
+
53 10:34:47 0 0.1000 0.5197978817910518
|
56 |
+
54 10:47:47 0 0.1000 0.5088265573809732
|
57 |
+
55 11:00:42 1 0.1000 0.5092196283081792
|
58 |
+
56 11:13:34 2 0.1000 0.5101087852019184
|
59 |
+
57 11:26:17 3 0.1000 0.5114516223376652
|
60 |
+
58 11:39:13 0 0.1000 0.5055079925453888
|
61 |
+
59 11:52:01 0 0.1000 0.5028705823815094
|
62 |
+
60 12:04:48 1 0.1000 0.5077681644907538
|
63 |
+
61 12:17:31 0 0.1000 0.493943511888666
|
64 |
+
62 12:30:40 1 0.1000 0.49909354941884304
|
65 |
+
63 12:43:38 0 0.1000 0.4935010322253659
|
66 |
+
64 12:56:41 0 0.1000 0.49263371167317876
|
67 |
+
65 13:09:30 1 0.1000 0.49477802515029906
|
68 |
+
66 13:22:17 0 0.1000 0.4894873375937624
|
69 |
+
67 13:35:02 1 0.1000 0.4900508265337854
|
70 |
+
68 13:47:54 0 0.1000 0.48103408392307895
|
71 |
+
69 14:00:52 1 0.1000 0.482991996205078
|
72 |
+
70 14:14:00 2 0.1000 0.48390048535364977
|
73 |
+
71 14:27:07 3 0.1000 0.4846018293106331
|
74 |
+
72 14:40:01 0 0.1000 0.480364065518919
|
75 |
+
73 14:53:03 0 0.1000 0.47812797212375785
|
76 |
+
74 15:05:56 0 0.1000 0.4719673815369606
|
77 |
+
75 15:18:49 1 0.1000 0.4757926766265113
|
78 |
+
76 15:31:41 0 0.1000 0.46993971415285796
|
79 |
+
77 15:44:32 1 0.1000 0.4722084892920728
|
80 |
+
78 15:57:20 2 0.1000 0.47019626197106434
|
81 |
+
79 16:10:14 0 0.1000 0.4698862019406175
|
82 |
+
80 16:23:07 0 0.1000 0.46922945463994764
|
83 |
+
81 16:35:56 0 0.1000 0.46842513900320487
|
84 |
+
82 16:48:45 0 0.1000 0.4596653935369456
|
85 |
+
83 17:01:35 1 0.1000 0.46220648641293904
|
86 |
+
84 17:14:30 2 0.1000 0.4606187267460913
|
87 |
+
85 17:27:26 0 0.1000 0.45330136719177355
|
88 |
+
86 17:40:23 1 0.1000 0.4552749111404959
|
89 |
+
87 17:53:27 2 0.1000 0.4595688052559799
|
90 |
+
88 18:06:15 0 0.1000 0.45305425408876165
|
91 |
+
89 18:19:11 1 0.1000 0.4585241228904364
|
92 |
+
90 18:32:06 2 0.1000 0.4604555804212138
|
93 |
+
91 18:45:08 3 0.1000 0.4554677476860442
|
94 |
+
92 18:58:04 0 0.1000 0.4489068861839906
|
95 |
+
93 19:10:59 1 0.1000 0.45116823060332606
|
96 |
+
94 19:23:45 2 0.1000 0.4489288940407195
|
97 |
+
95 19:36:36 0 0.1000 0.44275297022653076
|
98 |
+
96 19:49:23 1 0.1000 0.4452887841103212
|
99 |
+
97 20:02:12 2 0.1000 0.4453210852955872
|
100 |
+
98 20:14:58 3 0.1000 0.4464509549905669
|
101 |
+
99 20:28:04 4 0.1000 0.44596645003782126
|
102 |
+
100 20:41:13 0 0.0500 0.41892101504330365
|
103 |
+
101 20:54:19 0 0.0500 0.3984660865253997
|
104 |
+
102 21:07:12 0 0.0500 0.3909759231783309
|
105 |
+
103 21:20:22 0 0.0500 0.38897691094088105
|
106 |
+
104 21:33:12 0 0.0500 0.38891661282980217
|
107 |
+
105 21:46:02 0 0.0500 0.3788945140141361
|
108 |
+
106 21:58:47 0 0.0500 0.37884936595300456
|
109 |
+
107 22:11:42 0 0.0500 0.37052636316924725
|
110 |
+
108 22:24:46 1 0.0500 0.3740457253186208
|
111 |
+
109 22:37:55 2 0.0500 0.3722470565224594
|
112 |
+
110 22:51:02 0 0.0500 0.3700024942125914
|
113 |
+
111 23:04:09 0 0.0500 0.36512322439337674
|
114 |
+
112 23:16:50 0 0.0500 0.360866011077503
|
115 |
+
113 23:29:36 0 0.0500 0.3606146826777818
|
116 |
+
114 23:42:29 1 0.0500 0.36123125600364975
|
117 |
+
115 23:55:19 0 0.0500 0.3542564442135253
|
118 |
+
116 00:08:10 1 0.0500 0.3587884951396933
|
119 |
+
117 00:20:55 2 0.0500 0.3573749113926348
|
120 |
+
118 00:33:41 0 0.0500 0.3419303108273812
|
121 |
+
119 00:46:29 1 0.0500 0.3496313952162581
|
122 |
+
120 00:59:24 2 0.0500 0.3525260106769373
|
123 |
+
121 01:12:26 3 0.0500 0.3468611272579094
|
124 |
+
122 01:25:16 4 0.0500 0.3510722661468218
|
125 |
+
123 01:38:13 0 0.0250 0.3362890320566465
|
126 |
+
124 01:51:10 0 0.0250 0.3326504863151964
|
127 |
+
125 02:03:59 0 0.0250 0.3284786853756545
|
128 |
+
126 02:16:50 0 0.0250 0.32160441274912854
|
129 |
+
127 02:29:40 1 0.0250 0.325127320936266
|
130 |
+
128 02:42:33 0 0.0250 0.31972479293931205
|
131 |
+
129 02:55:40 0 0.0250 0.3179566932230625
|
132 |
+
130 03:08:39 1 0.0250 0.3196000659690713
|
133 |
+
131 03:21:27 2 0.0250 0.32247103282162604
|
134 |
+
132 03:34:27 0 0.0250 0.31690518119425143
|
135 |
+
133 03:47:38 0 0.0250 0.3140822275742045
|
136 |
+
134 04:00:47 1 0.0250 0.31784330996702304
|
137 |
+
135 04:13:51 2 0.0250 0.3173914384504534
|
138 |
+
136 04:26:53 0 0.0250 0.3117131764708825
|
139 |
+
137 04:39:44 0 0.0250 0.30754536822157086
|
140 |
+
138 04:52:35 0 0.0250 0.3054389862997352
|
141 |
+
139 05:05:34 1 0.0250 0.3097640820817565
|
142 |
+
140 05:18:40 2 0.0250 0.3103047859443808
|
143 |
+
141 05:31:34 3 0.0250 0.3074088339310772
|
144 |
+
142 05:44:36 0 0.0250 0.30093574178668686
|
145 |
+
143 05:57:27 1 0.0250 0.30524236956997863
|
146 |
+
144 06:10:33 0 0.0250 0.29936067997846966
|
147 |
+
145 06:23:28 1 0.0250 0.3067479654752983
|
148 |
+
146 06:36:27 2 0.0250 0.30121562754208187
|
149 |
+
147 06:49:30 0 0.0250 0.2983897975537012
|
150 |
+
148 07:02:25 1 0.0250 0.30219820415355125
|
151 |
+
149 07:15:20 2 0.0250 0.30205940004227294
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3a08d6bcbe6be469b9be1e0bdedb6b740469ac2f2915418af77ab949b965e4a6
|
3 |
+
size 1331379415
|