browndw commited on
Commit
369e511
1 Parent(s): 8943bbd

Update spaCy pipeline

Browse files
Files changed (10) hide show
  1. README.md +16 -16
  2. config.cfg +10 -8
  3. en_docusco_spacy_cd-any-py3-none-any.whl +2 -2
  4. meta.json +108 -89
  5. ner/model +0 -0
  6. ner/moves +1 -1
  7. tagger/cfg +24 -3
  8. tagger/model +0 -0
  9. tok2vec/model +1 -1
  10. vocab/strings.json +0 -0
README.md CHANGED
@@ -14,28 +14,28 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.7896141572
18
  - name: NER Recall
19
  type: recall
20
- value: 0.7757775447
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.7826346995
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9734866573
31
  ---
32
  English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.
33
 
34
  | Feature | Description |
35
  | --- | --- |
36
  | **Name** | `en_docusco_spacy_cd` |
37
- | **Version** | `1.2` |
38
- | **spaCy** | `>=3.5.0,<3.6.0` |
39
  | **Default Pipeline** | `tok2vec`, `tagger`, `ner` |
40
  | **Components** | `tok2vec`, `tagger`, `ner` |
41
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
@@ -47,12 +47,12 @@ English pipeline for part-of-speech and rhetorical tagging using a smaller 'comm
47
 
48
  <details>
49
 
50
- <summary>View label scheme (270 labels for 2 components)</summary>
51
 
52
  | Component | Labels |
53
  | --- | --- |
54
- | **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN231`, `NN232`, `NN233`, `NN31`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
55
- | **`ner`** | `ActorsAbstractions`, `ActorsFirstPerson`, `ActorsPeople`, `ActorsPublicEntities`, `CitationAuthority`, `CitationControversy`, `CitationHedged`, `CitationNeutral`, `ConfidenceHedged`, `ConfidenceHigh`, `OrganizationNarrative`, `OrganizationReasoning`, `PlanningFuture`, `PlanningStrategy`, `SentimentNegative`, `SentimentPositive`, `SignpostingAcademicWritingMoves`, `SignpostingMetadiscourse`, `StanceEmphatic`, `StanceModerated` |
56
 
57
  </details>
58
 
@@ -60,10 +60,10 @@ English pipeline for part-of-speech and rhetorical tagging using a smaller 'comm
60
 
61
  | Type | Score |
62
  | --- | --- |
63
- | `TAG_ACC` | 97.35 |
64
- | `ENTS_F` | 78.26 |
65
- | `ENTS_P` | 78.96 |
66
- | `ENTS_R` | 77.58 |
67
- | `TOK2VEC_LOSS` | 5937424.94 |
68
- | `TAGGER_LOSS` | 1136040.49 |
69
- | `NER_LOSS` | 3941726.32 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.8206658604
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.80740266
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8139802353
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9763683149
31
  ---
32
  English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.
33
 
34
  | Feature | Description |
35
  | --- | --- |
36
  | **Name** | `en_docusco_spacy_cd` |
37
+ | **Version** | `1.3` |
38
+ | **spaCy** | `>=3.7.4,<3.8.0` |
39
  | **Default Pipeline** | `tok2vec`, `tagger`, `ner` |
40
  | **Components** | `tok2vec`, `tagger`, `ner` |
41
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
 
47
 
48
  <details>
49
 
50
+ <summary>View label scheme (289 labels for 2 components)</summary>
51
 
52
  | Component | Labels |
53
  | --- | --- |
54
+ | **`tagger`** | `APPGE`, `AT`, `AT1`, `BCL21`, `BCL22`, `CC`, `CCB`, `CS`, `CS21`, `CS22`, `CS31`, `CS32`, `CS33`, `CS41`, `CS42`, `CS43`, `CS44`, `CSA`, `CSN`, `CST`, `CSW`, `CSW31`, `CSW32`, `CSW33`, `DA`, `DA1`, `DA2`, `DAR`, `DAT`, `DB`, `DB2`, `DD`, `DD1`, `DD2`, `DDQ`, `DDQGE`, `DDQGE31`, `DDQGE32`, `DDQGE33`, `DDQV`, `DDQV31`, `DDQV32`, `DDQV33`, `EX`, `FO`, `FU`, `FW`, `GE`, `IF`, `II`, `II21`, `II22`, `II31`, `II32`, `II33`, `II41`, `II42`, `II43`, `II44`, `IO`, `IW`, `JJ`, `JJ21`, `JJ22`, `JJ31`, `JJ32`, `JJ33`, `JJ41`, `JJ42`, `JJ43`, `JJ44`, `JJR`, `JJT`, `JK`, `MC`, `MC1`, `MC121`, `MC122`, `MC2`, `MC221`, `MC222`, `MCMC`, `MD`, `MF`, `ND1`, `NN`, `NN1`, `NN121`, `NN122`, `NN131`, `NN132`, `NN133`, `NN141`, `NN142`, `NN143`, `NN144`, `NN2`, `NN21`, `NN22`, `NN221`, `NN222`, `NN31`, `NN32`, `NN33`, `NNA`, `NNB`, `NNL1`, `NNL2`, `NNO`, `NNO2`, `NNT1`, `NNT131`, `NNT132`, `NNT133`, `NNT2`, `NNU`, `NNU1`, `NNU2`, `NNU21`, `NNU22`, `NP`, `NP1`, `NP2`, `NPD1`, `NPD2`, `NPM1`, `NPM2`, `PN`, `PN1`, `PN121`, `PN122`, `PN21`, `PN22`, `PNQO`, `PNQS`, `PNQS31`, `PNQS32`, `PNQS33`, `PNQV`, `PNQV31`, `PNQV32`, `PNQV33`, `PNX1`, `PPGE`, `PPH1`, `PPHO1`, `PPHO2`, `PPHS1`, `PPHS2`, `PPIO1`, `PPIO2`, `PPIS1`, `PPIS2`, `PPX1`, `PPX121`, `PPX122`, `PPX2`, `PPX221`, `PPX222`, `PPY`, `RA`, `RA21`, `RA22`, `REX`, `REX21`, `REX22`, `REX41`, `REX42`, `REX43`, `REX44`, `RG`, `RG21`, `RG22`, `RG41`, `RG42`, `RG43`, `RG44`, `RGQ`, `RGQV`, `RGQV31`, `RGQV32`, `RGQV33`, `RGR`, `RGT`, `RL`, `RL21`, `RL22`, `RL31`, `RL32`, `RL33`, `RP`, `RPK`, `RR`, `RR21`, `RR22`, `RR31`, `RR32`, `RR33`, `RR41`, `RR42`, `RR43`, `RR44`, `RR51`, `RR52`, `RR53`, `RR54`, `RR55`, `RRQ`, `RRQV`, `RRQV31`, `RRQV32`, `RRQV33`, `RRR`, `RRT`, `RT`, `RT21`, `RT22`, `RT31`, `RT32`, `RT33`, `RT41`, `RT42`, `RT43`, `RT44`, `TO`, `UH`, `UH21`, `UH22`, `UH31`, `UH32`, `UH33`, `VB0`, `VBDR`, `VBDZ`, `VBG`, `VBI`, `VBM`, `VBN`, `VBR`, `VBZ`, `VD0`, `VDD`, `VDG`, `VDI`, `VDN`, `VDZ`, `VH0`, `VHD`, `VHG`, `VHI`, `VHN`, `VHZ`, `VM`, `VM21`, `VM22`, `VMK`, `VV0`, `VVD`, `VVG`, `VVGK`, `VVI`, `VVN`, `VVNK`, `VVZ`, `XX`, `Y`, `ZZ1`, `ZZ2`, `ZZ221`, `ZZ222` |
55
+ | **`ner`** | `ActorsAbstractions`, `ActorsFirstPerson`, `ActorsPeople`, `ActorsPublicEntities`, `CitationAuthority`, `CitationControversy`, `CitationNeutral`, `ConfidenceHedged`, `ConfidenceHigh`, `OrganizationNarrative`, `OrganizationReasoning`, `PlanningFuture`, `PlanningStrategy`, `SentimentNegative`, `SentimentPositive`, `SignpostingAcademicWritingMoves`, `SignpostingMetadiscourse`, `StanceEmphatic`, `StanceModerated` |
56
 
57
  </details>
58
 
 
60
 
61
  | Type | Score |
62
  | --- | --- |
63
+ | `TAG_ACC` | 97.64 |
64
+ | `ENTS_F` | 81.40 |
65
+ | `ENTS_P` | 82.07 |
66
+ | `ENTS_R` | 80.74 |
67
+ | `TOK2VEC_LOSS` | 150973939.97 |
68
+ | `TAGGER_LOSS` | 3936874.26 |
69
+ | `NER_LOSS` | 12742855.43 |
config.cfg CHANGED
@@ -1,6 +1,6 @@
1
  [paths]
2
- train = ""
3
- dev = ""
4
  vectors = null
5
  init_tok2vec = null
6
 
@@ -17,6 +17,7 @@ before_creation = null
17
  after_creation = null
18
  after_pipeline_creation = null
19
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
 
20
 
21
  [components]
22
 
@@ -43,6 +44,7 @@ upstream = "*"
43
 
44
  [components.tagger]
45
  factory = "tagger"
 
46
  neg_prefix = "!"
47
  overwrite = false
48
  scorer = {"@scorers":"spacy.tagger_scorer.v1"}
@@ -102,10 +104,10 @@ seed = ${system.seed}
102
  gpu_allocator = ${system.gpu_allocator}
103
  dropout = 0.1
104
  accumulate_gradient = 1
105
- patience = 1600
106
- max_epochs = 0
107
- max_steps = 35000
108
- eval_frequency = 250
109
  frozen_components = []
110
  annotating_components = []
111
  before_to_disk = null
@@ -140,8 +142,8 @@ eps = 0.00000001
140
  learn_rate = 0.001
141
 
142
  [training.score_weights]
143
- tag_acc = 0.5
144
- ents_f = 0.5
145
  ents_p = 0.0
146
  ents_r = 0.0
147
  ents_per_type = null
 
1
  [paths]
2
+ train = "spacy_train_07.spacy"
3
+ dev = "spacy_dev_07.spacy"
4
  vectors = null
5
  init_tok2vec = null
6
 
 
17
  after_creation = null
18
  after_pipeline_creation = null
19
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
20
+ vectors = {"@vectors":"spacy.Vectors.v1"}
21
 
22
  [components]
23
 
 
44
 
45
  [components.tagger]
46
  factory = "tagger"
47
+ label_smoothing = 0.05
48
  neg_prefix = "!"
49
  overwrite = false
50
  scorer = {"@scorers":"spacy.tagger_scorer.v1"}
 
104
  gpu_allocator = ${system.gpu_allocator}
105
  dropout = 0.1
106
  accumulate_gradient = 1
107
+ patience = 20000
108
+ max_epochs = -1
109
+ max_steps = 80000
110
+ eval_frequency = 1000
111
  frozen_components = []
112
  annotating_components = []
113
  before_to_disk = null
 
142
  learn_rate = 0.001
143
 
144
  [training.score_weights]
145
+ tag_acc = 0.4
146
+ ents_f = 0.6
147
  ents_p = 0.0
148
  ents_r = 0.0
149
  ents_per_type = null
en_docusco_spacy_cd-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1fc8685dd2f705596754da88344d1b98cb6283710de43c2cc6f2710c0ce5b8e
3
- size 6956762
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c4ba27b27fa3effb8af587c05fc1d6a1a7a312ced5d884a65cbb048a84e8a93
3
+ size 8394802
meta.json CHANGED
@@ -1,13 +1,14 @@
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy_cd",
4
- "version":"1.2",
5
  "description":"English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.",
6
  "author":"David Brown",
7
  "email":"dwb2@andrew.cmu.edu",
8
  "url":"https://docuscope.github.io",
9
  "license":"MIT",
10
- "spacy_git_version":"Unknown",
 
11
  "vectors":{
12
  "width":0,
13
  "vectors":0,
@@ -55,6 +56,9 @@
55
  "DD2",
56
  "DDQ",
57
  "DDQGE",
 
 
 
58
  "DDQV",
59
  "DDQV31",
60
  "DDQV32",
@@ -83,11 +87,17 @@
83
  "JJ31",
84
  "JJ32",
85
  "JJ33",
 
 
 
 
86
  "JJR",
87
  "JJT",
88
  "JK",
89
  "MC",
90
  "MC1",
 
 
91
  "MC2",
92
  "MC221",
93
  "MC222",
@@ -111,10 +121,8 @@
111
  "NN22",
112
  "NN221",
113
  "NN222",
114
- "NN231",
115
- "NN232",
116
- "NN233",
117
  "NN31",
 
118
  "NN33",
119
  "NNA",
120
  "NNB",
@@ -123,6 +131,9 @@
123
  "NNO",
124
  "NNO2",
125
  "NNT1",
 
 
 
126
  "NNT2",
127
  "NNU",
128
  "NNU1",
@@ -148,6 +159,9 @@
148
  "PNQS32",
149
  "PNQS33",
150
  "PNQV",
 
 
 
151
  "PNX1",
152
  "PPGE",
153
  "PPH1",
@@ -179,6 +193,10 @@
179
  "RG",
180
  "RG21",
181
  "RG22",
 
 
 
 
182
  "RGQ",
183
  "RGQV",
184
  "RGQV31",
@@ -189,6 +207,9 @@
189
  "RL",
190
  "RL21",
191
  "RL22",
 
 
 
192
  "RP",
193
  "RPK",
194
  "RR",
@@ -277,7 +298,6 @@
277
  "ActorsPublicEntities",
278
  "CitationAuthority",
279
  "CitationControversy",
280
- "CitationHedged",
281
  "CitationNeutral",
282
  "ConfidenceHedged",
283
  "ConfidenceHigh",
@@ -307,112 +327,111 @@
307
 
308
  ],
309
  "performance":{
310
- "tag_acc":0.9734866573,
311
- "ents_f":0.7826346995,
312
- "ents_p":0.7896141572,
313
- "ents_r":0.7757775447,
314
  "ents_per_type":{
315
- "ActorsFirstPerson":{
316
- "p":0.8180863993,
317
- "r":0.8443950279,
318
- "f":0.8310325477
319
- },
320
  "ActorsPeople":{
321
- "p":0.855646716,
322
- "r":0.8830180997,
323
- "f":0.8691169573
324
  },
325
- "CitationNeutral":{
326
- "p":0.7527158376,
327
- "r":0.7425267908,
328
- "f":0.7475865985
329
  },
330
- "SentimentNegative":{
331
- "p":0.7090227054,
332
- "r":0.6414706491,
333
- "f":0.6735571909
334
  },
335
- "ActorsAbstractions":{
336
- "p":0.7691966267,
337
- "r":0.8180602299,
338
- "f":0.7928762968
339
  },
340
- "PlanningStrategy":{
341
- "p":0.721360087,
342
- "r":0.6471005497,
343
- "f":0.6822154709
344
  },
345
- "SignpostingAcademicWritingMoves":{
346
- "p":0.6508373967,
347
- "r":0.5615511249,
348
- "f":0.60290652
349
  },
350
- "OrganizationNarrative":{
351
- "p":0.7794650481,
352
- "r":0.7044971007,
353
- "f":0.74008743
354
  },
355
- "PlanningFuture":{
356
- "p":0.7692376361,
357
- "r":0.7372518823,
358
- "f":0.7529051988
359
  },
360
- "ConfidenceHedged":{
361
- "p":0.8042735043,
362
- "r":0.7942268737,
363
- "f":0.7992186173
364
  },
365
- "SentimentPositive":{
366
- "p":0.7252046892,
367
- "r":0.6383048418,
368
- "f":0.678985594
369
  },
370
- "StanceEmphatic":{
371
- "p":0.7869956077,
372
- "r":0.8246021505,
373
- "f":0.8053601059
374
  },
375
- "SignpostingMetadiscourse":{
376
- "p":0.9005463375,
377
- "r":0.8505900903,
378
- "f":0.8748556405
379
  },
380
- "CitationControversy":{
381
- "p":0.7772643253,
382
- "r":0.7109044801,
383
- "f":0.7426048565
384
  },
385
- "ActorsPublicEntities":{
386
- "p":0.8000974738,
387
- "r":0.7844855049,
388
- "f":0.7922145816
389
  },
390
- "OrganizationReasoning":{
391
- "p":0.8085699285,
392
- "r":0.8050261359,
393
- "f":0.8067941408
394
  },
395
- "ConfidenceHigh":{
396
- "p":0.7475272184,
397
- "r":0.7257841647,
398
- "f":0.7364952501
399
  },
400
- "CitationAuthority":{
401
- "p":0.7098366882,
402
- "r":0.6263404826,
403
- "f":0.6654797935
404
  },
405
  "StanceModerated":{
406
- "p":0.7352631579,
407
- "r":0.7569764292,
408
- "f":0.7459618209
 
 
 
 
 
409
  }
410
  },
411
- "tok2vec_loss":59374.2493771126,
412
- "tagger_loss":11360.4048709869,
413
- "ner_loss":39417.2632061559
414
  },
415
- "spacy_version":">=3.5.0,<3.6.0",
416
  "requirements":[
417
 
418
  ]
 
1
  {
2
  "lang":"en",
3
  "name":"docusco_spacy_cd",
4
+ "version":"1.3",
5
  "description":"English pipeline for part-of-speech and rhetorical tagging using a smaller 'common dictionary'.",
6
  "author":"David Brown",
7
  "email":"dwb2@andrew.cmu.edu",
8
  "url":"https://docuscope.github.io",
9
  "license":"MIT",
10
+ "spacy_version":">=3.7.4,<3.8.0",
11
+ "spacy_git_version":"bff8725f4",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
56
  "DD2",
57
  "DDQ",
58
  "DDQGE",
59
+ "DDQGE31",
60
+ "DDQGE32",
61
+ "DDQGE33",
62
  "DDQV",
63
  "DDQV31",
64
  "DDQV32",
 
87
  "JJ31",
88
  "JJ32",
89
  "JJ33",
90
+ "JJ41",
91
+ "JJ42",
92
+ "JJ43",
93
+ "JJ44",
94
  "JJR",
95
  "JJT",
96
  "JK",
97
  "MC",
98
  "MC1",
99
+ "MC121",
100
+ "MC122",
101
  "MC2",
102
  "MC221",
103
  "MC222",
 
121
  "NN22",
122
  "NN221",
123
  "NN222",
 
 
 
124
  "NN31",
125
+ "NN32",
126
  "NN33",
127
  "NNA",
128
  "NNB",
 
131
  "NNO",
132
  "NNO2",
133
  "NNT1",
134
+ "NNT131",
135
+ "NNT132",
136
+ "NNT133",
137
  "NNT2",
138
  "NNU",
139
  "NNU1",
 
159
  "PNQS32",
160
  "PNQS33",
161
  "PNQV",
162
+ "PNQV31",
163
+ "PNQV32",
164
+ "PNQV33",
165
  "PNX1",
166
  "PPGE",
167
  "PPH1",
 
193
  "RG",
194
  "RG21",
195
  "RG22",
196
+ "RG41",
197
+ "RG42",
198
+ "RG43",
199
+ "RG44",
200
  "RGQ",
201
  "RGQV",
202
  "RGQV31",
 
207
  "RL",
208
  "RL21",
209
  "RL22",
210
+ "RL31",
211
+ "RL32",
212
+ "RL33",
213
  "RP",
214
  "RPK",
215
  "RR",
 
298
  "ActorsPublicEntities",
299
  "CitationAuthority",
300
  "CitationControversy",
 
301
  "CitationNeutral",
302
  "ConfidenceHedged",
303
  "ConfidenceHigh",
 
327
 
328
  ],
329
  "performance":{
330
+ "tag_acc":0.9763683149,
331
+ "ents_f":0.8139802353,
332
+ "ents_p":0.8206658604,
333
+ "ents_r":0.80740266,
334
  "ents_per_type":{
 
 
 
 
 
335
  "ActorsPeople":{
336
+ "p":0.8542168374,
337
+ "r":0.8696353974,
338
+ "f":0.8618571637
339
  },
340
+ "ActorsPublicEntities":{
341
+ "p":0.8169841646,
342
+ "r":0.8246103931,
343
+ "f":0.8207795646
344
  },
345
+ "OrganizationReasoning":{
346
+ "p":0.8497536946,
347
+ "r":0.8395089039,
348
+ "f":0.8446002337
349
  },
350
+ "ActorsFirstPerson":{
351
+ "p":0.8645147555,
352
+ "r":0.8759769676,
353
+ "f":0.8702081187
354
  },
355
+ "ConfidenceHedged":{
356
+ "p":0.8414330099,
357
+ "r":0.849020822,
358
+ "f":0.8452098865
359
  },
360
+ "SentimentPositive":{
361
+ "p":0.7541410809,
362
+ "r":0.6988926856,
363
+ "f":0.7254665342
364
  },
365
+ "SignpostingMetadiscourse":{
366
+ "p":0.9222331178,
367
+ "r":0.8799657453,
368
+ "f":0.9006037785
369
  },
370
+ "ActorsAbstractions":{
371
+ "p":0.812620511,
372
+ "r":0.8397741356,
373
+ "f":0.825974217
374
  },
375
+ "CitationAuthority":{
376
+ "p":0.7421895511,
377
+ "r":0.6606683805,
378
+ "f":0.6990603363
379
  },
380
+ "SentimentNegative":{
381
+ "p":0.7569732066,
382
+ "r":0.681115792,
383
+ "f":0.7170438069
384
  },
385
+ "OrganizationNarrative":{
386
+ "p":0.8146691347,
387
+ "r":0.7606297812,
388
+ "f":0.7867225698
389
  },
390
+ "StanceEmphatic":{
391
+ "p":0.8325835219,
392
+ "r":0.8587117676,
393
+ "f":0.8454458216
394
  },
395
+ "ConfidenceHigh":{
396
+ "p":0.793492611,
397
+ "r":0.7964435325,
398
+ "f":0.7949653333
399
  },
400
+ "PlanningFuture":{
401
+ "p":0.8015720524,
402
+ "r":0.7731229292,
403
+ "f":0.7870905037
404
  },
405
+ "SignpostingAcademicWritingMoves":{
406
+ "p":0.6799470549,
407
+ "r":0.6417239225,
408
+ "f":0.6602827763
409
  },
410
+ "PlanningStrategy":{
411
+ "p":0.7405392335,
412
+ "r":0.7067443605,
413
+ "f":0.7232472325
414
  },
415
+ "CitationNeutral":{
416
+ "p":0.8012995179,
417
+ "r":0.7580805076,
418
+ "f":0.7790910944
419
  },
420
  "StanceModerated":{
421
+ "p":0.8127539304,
422
+ "r":0.8244042286,
423
+ "f":0.8185376268
424
+ },
425
+ "CitationControversy":{
426
+ "p":0.7450381679,
427
+ "r":0.7160674982,
428
+ "f":0.7302656192
429
  }
430
  },
431
+ "tok2vec_loss":1509739.3996848087,
432
+ "tagger_loss":39368.7426280975,
433
+ "ner_loss":127428.554314194
434
  },
 
435
  "requirements":[
436
 
437
  ]
ner/model CHANGED
Binary files a/ner/model and b/ner/model differ
 
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves��{"0":{},"1":{"ActorsAbstractions":574624,"SentimentNegative":498816,"ActorsPeople":490889,"SentimentPositive":329200,"OrganizationNarrative":327795,"SignpostingMetadiscourse":287016,"ActorsFirstPerson":242625,"OrganizationReasoning":182969,"StanceEmphatic":148909,"ActorsPublicEntities":141388,"ConfidenceHedged":132889,"ConfidenceHigh":117539,"PlanningFuture":91199,"PlanningStrategy":77436,"SignpostingAcademicWritingMoves":45355,"CitationNeutral":28827,"StanceModerated":24999,"CitationAuthority":24695,"CitationControversy":7780,"CitationHedged":3},"2":{"ActorsAbstractions":574624,"SentimentNegative":498816,"ActorsPeople":490889,"SentimentPositive":329200,"OrganizationNarrative":327795,"SignpostingMetadiscourse":287016,"ActorsFirstPerson":242625,"OrganizationReasoning":182969,"StanceEmphatic":148909,"ActorsPublicEntities":141388,"ConfidenceHedged":132889,"ConfidenceHigh":117539,"PlanningFuture":91199,"PlanningStrategy":77436,"SignpostingAcademicWritingMoves":45355,"CitationNeutral":28827,"StanceModerated":24999,"CitationAuthority":24695,"CitationControversy":7780,"CitationHedged":3},"3":{"ActorsAbstractions":574624,"SentimentNegative":498816,"ActorsPeople":490889,"SentimentPositive":329200,"OrganizationNarrative":327795,"SignpostingMetadiscourse":287016,"ActorsFirstPerson":242625,"OrganizationReasoning":182969,"StanceEmphatic":148909,"ActorsPublicEntities":141388,"ConfidenceHedged":132889,"ConfidenceHigh":117539,"PlanningFuture":91199,"PlanningStrategy":77436,"SignpostingAcademicWritingMoves":45355,"CitationNeutral":28827,"StanceModerated":24999,"CitationAuthority":24695,"CitationControversy":7780,"CitationHedged":3},"4":{"ActorsAbstractions":574624,"SentimentNegative":498816,"ActorsPeople":490889,"SentimentPositive":329200,"OrganizationNarrative":327795,"SignpostingMetadiscourse":287016,"ActorsFirstPerson":242625,"OrganizationReasoning":182969,"StanceEmphatic":148909,"ActorsPublicEntities":141388,"ConfidenceHedged":132889,"ConfidenceHigh":117539,"PlanningFuture":91199,"PlanningStrategy":77436,"SignpostingAcademicWritingMoves":45355,"CitationNeutral":28827,"StanceModerated":24999,"CitationAuthority":24695,"CitationControversy":7780,"CitationHedged":3,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves�t{"0":{},"1":{"ActorsPeople":2252459,"ActorsAbstractions":2160829,"SentimentNegative":1838447,"OrganizationNarrative":1220253,"SentimentPositive":1215068,"SignpostingMetadiscourse":982819,"ActorsFirstPerson":942047,"OrganizationReasoning":603068,"StanceEmphatic":540777,"ActorsPublicEntities":488472,"ConfidenceHedged":449697,"ConfidenceHigh":422991,"PlanningFuture":318827,"PlanningStrategy":277732,"SignpostingAcademicWritingMoves":153321,"CitationNeutral":95864,"StanceModerated":85078,"CitationAuthority":80084,"CitationControversy":22589},"2":{"ActorsPeople":2252459,"ActorsAbstractions":2160829,"SentimentNegative":1838447,"OrganizationNarrative":1220253,"SentimentPositive":1215068,"SignpostingMetadiscourse":982819,"ActorsFirstPerson":942047,"OrganizationReasoning":603068,"StanceEmphatic":540777,"ActorsPublicEntities":488472,"ConfidenceHedged":449697,"ConfidenceHigh":422991,"PlanningFuture":318827,"PlanningStrategy":277732,"SignpostingAcademicWritingMoves":153321,"CitationNeutral":95864,"StanceModerated":85078,"CitationAuthority":80084,"CitationControversy":22589},"3":{"ActorsPeople":2252459,"ActorsAbstractions":2160829,"SentimentNegative":1838447,"OrganizationNarrative":1220253,"SentimentPositive":1215068,"SignpostingMetadiscourse":982819,"ActorsFirstPerson":942047,"OrganizationReasoning":603068,"StanceEmphatic":540777,"ActorsPublicEntities":488472,"ConfidenceHedged":449697,"ConfidenceHigh":422991,"PlanningFuture":318827,"PlanningStrategy":277732,"SignpostingAcademicWritingMoves":153321,"CitationNeutral":95864,"StanceModerated":85078,"CitationAuthority":80084,"CitationControversy":22589},"4":{"ActorsPeople":2252459,"ActorsAbstractions":2160829,"SentimentNegative":1838447,"OrganizationNarrative":1220253,"SentimentPositive":1215068,"SignpostingMetadiscourse":982819,"ActorsFirstPerson":942047,"OrganizationReasoning":603068,"StanceEmphatic":540777,"ActorsPublicEntities":488472,"ConfidenceHedged":449697,"ConfidenceHigh":422991,"PlanningFuture":318827,"PlanningStrategy":277732,"SignpostingAcademicWritingMoves":153321,"CitationNeutral":95864,"StanceModerated":85078,"CitationAuthority":80084,"CitationControversy":22589,"":1},"5":{"":1}}�cfg��neg_key�
tagger/cfg CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "labels":[
3
  "APPGE",
4
  "AT",
@@ -36,6 +37,9 @@
36
  "DD2",
37
  "DDQ",
38
  "DDQGE",
 
 
 
39
  "DDQV",
40
  "DDQV31",
41
  "DDQV32",
@@ -64,11 +68,17 @@
64
  "JJ31",
65
  "JJ32",
66
  "JJ33",
 
 
 
 
67
  "JJR",
68
  "JJT",
69
  "JK",
70
  "MC",
71
  "MC1",
 
 
72
  "MC2",
73
  "MC221",
74
  "MC222",
@@ -92,10 +102,8 @@
92
  "NN22",
93
  "NN221",
94
  "NN222",
95
- "NN231",
96
- "NN232",
97
- "NN233",
98
  "NN31",
 
99
  "NN33",
100
  "NNA",
101
  "NNB",
@@ -104,6 +112,9 @@
104
  "NNO",
105
  "NNO2",
106
  "NNT1",
 
 
 
107
  "NNT2",
108
  "NNU",
109
  "NNU1",
@@ -129,6 +140,9 @@
129
  "PNQS32",
130
  "PNQS33",
131
  "PNQV",
 
 
 
132
  "PNX1",
133
  "PPGE",
134
  "PPH1",
@@ -160,6 +174,10 @@
160
  "RG",
161
  "RG21",
162
  "RG22",
 
 
 
 
163
  "RGQ",
164
  "RGQV",
165
  "RGQV31",
@@ -170,6 +188,9 @@
170
  "RL",
171
  "RL21",
172
  "RL22",
 
 
 
173
  "RP",
174
  "RPK",
175
  "RR",
 
1
  {
2
+ "label_smoothing":0.05,
3
  "labels":[
4
  "APPGE",
5
  "AT",
 
37
  "DD2",
38
  "DDQ",
39
  "DDQGE",
40
+ "DDQGE31",
41
+ "DDQGE32",
42
+ "DDQGE33",
43
  "DDQV",
44
  "DDQV31",
45
  "DDQV32",
 
68
  "JJ31",
69
  "JJ32",
70
  "JJ33",
71
+ "JJ41",
72
+ "JJ42",
73
+ "JJ43",
74
+ "JJ44",
75
  "JJR",
76
  "JJT",
77
  "JK",
78
  "MC",
79
  "MC1",
80
+ "MC121",
81
+ "MC122",
82
  "MC2",
83
  "MC221",
84
  "MC222",
 
102
  "NN22",
103
  "NN221",
104
  "NN222",
 
 
 
105
  "NN31",
106
+ "NN32",
107
  "NN33",
108
  "NNA",
109
  "NNB",
 
112
  "NNO",
113
  "NNO2",
114
  "NNT1",
115
+ "NNT131",
116
+ "NNT132",
117
+ "NNT133",
118
  "NNT2",
119
  "NNU",
120
  "NNU1",
 
140
  "PNQS32",
141
  "PNQS33",
142
  "PNQV",
143
+ "PNQV31",
144
+ "PNQV32",
145
+ "PNQV33",
146
  "PNX1",
147
  "PPGE",
148
  "PPH1",
 
174
  "RG",
175
  "RG21",
176
  "RG22",
177
+ "RG41",
178
+ "RG42",
179
+ "RG43",
180
+ "RG44",
181
  "RGQ",
182
  "RGQV",
183
  "RGQV31",
 
188
  "RL",
189
  "RL21",
190
  "RL22",
191
+ "RL31",
192
+ "RL32",
193
+ "RL33",
194
  "RP",
195
  "RPK",
196
  "RR",
tagger/model CHANGED
Binary files a/tagger/model and b/tagger/model differ
 
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2bef1c838277d6641b02d24484bfc90fc6cab1da7c6972fdcb9ddd1d37318a30
3
  size 6009091
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58e0806e259d1699a33eb0804db3d207aea31ea5aba7826c5f32b62076f718c4
3
  size 6009091
vocab/strings.json CHANGED
The diff for this file is too large to render. See raw diff