EC2 Default User commited on
Commit
5398b80
1 Parent(s): f799402

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -19,3 +19,4 @@
19
  *strings.json filter=lfs diff=lfs merge=lfs -text
20
  vectors filter=lfs diff=lfs merge=lfs -text
21
  model filter=lfs diff=lfs merge=lfs -text
 
 
19
  *strings.json filter=lfs diff=lfs merge=lfs -text
20
  vectors filter=lfs diff=lfs merge=lfs -text
21
  model filter=lfs diff=lfs merge=lfs -text
22
+ *key2row filter=lfs diff=lfs merge=lfs -text
LICENSES_SOURCES CHANGED
@@ -878,557 +878,6 @@ Creative Commons may be contacted at creativecommons.org.
878
 
879
 
880
 
881
- # Lemmatization Lists
882
-
883
- * Author: Michal Měchura
884
- * URL: https://github.com/michmech/lemmatization-lists/
885
- * License: ODbL
886
-
887
- ```
888
- ## ODC Open Database License (ODbL)
889
-
890
- ### Preamble
891
-
892
- The Open Database License (ODbL) is a license agreement intended to
893
- allow users to freely share, modify, and use this Database while
894
- maintaining this same freedom for others. Many databases are covered by
895
- copyright, and therefore this document licenses these rights. Some
896
- jurisdictions, mainly in the European Union, have specific rights that
897
- cover databases, and so the ODbL addresses these rights, too. Finally,
898
- the ODbL is also an agreement in contract for users of this Database to
899
- act in certain ways in return for accessing this Database.
900
-
901
- Databases can contain a wide variety of types of content (images,
902
- audiovisual material, and sounds all in the same database, for example),
903
- and so the ODbL only governs the rights over the Database, and not the
904
- contents of the Database individually. Licensors should use the ODbL
905
- together with another license for the contents, if the contents have a
906
- single set of rights that uniformly covers all of the contents. If the
907
- contents have multiple sets of different rights, Licensors should
908
- describe what rights govern what contents together in the individual
909
- record or in some other way that clarifies what rights apply.
910
-
911
- Sometimes the contents of a database, or the database itself, can be
912
- covered by other rights not addressed here (such as private contracts,
913
- trade mark over the name, or privacy rights / data protection rights
914
- over information in the contents), and so you are advised that you may
915
- have to consult other documents or clear other rights before doing
916
- activities not covered by this License.
917
-
918
- ------
919
-
920
- The Licensor (as defined below)
921
-
922
- and
923
-
924
- You (as defined below)
925
-
926
- agree as follows:
927
-
928
- ### 1.0 Definitions of Capitalised Words
929
-
930
- "Collective Database" – Means this Database in unmodified form as part
931
- of a collection of independent databases in themselves that together are
932
- assembled into a collective whole. A work that constitutes a Collective
933
- Database will not be considered a Derivative Database.
934
-
935
- "Convey" – As a verb, means Using the Database, a Derivative Database,
936
- or the Database as part of a Collective Database in any way that enables
937
- a Person to make or receive copies of the Database or a Derivative
938
- Database. Conveying does not include interaction with a user through a
939
- computer network, or creating and Using a Produced Work, where no
940
- transfer of a copy of the Database or a Derivative Database occurs.
941
- "Contents" – The contents of this Database, which includes the
942
- information, independent works, or other material collected into the
943
- Database. For example, the contents of the Database could be factual
944
- data or works such as images, audiovisual material, text, or sounds.
945
-
946
- "Database" – A collection of material (the Contents) arranged in a
947
- systematic or methodical way and individually accessible by electronic
948
- or other means offered under the terms of this License.
949
-
950
- "Database Directive" – Means Directive 96/9/EC of the European
951
- Parliament and of the Council of 11 March 1996 on the legal protection
952
- of databases, as amended or succeeded.
953
-
954
- "Database Right" – Means rights resulting from the Chapter III ("sui
955
- generis") rights in the Database Directive (as amended and as transposed
956
- by member states), which includes the Extraction and Re-utilisation of
957
- the whole or a Substantial part of the Contents, as well as any similar
958
- rights available in the relevant jurisdiction under Section 10.4.
959
-
960
- "Derivative Database" – Means a database based upon the Database, and
961
- includes any translation, adaptation, arrangement, modification, or any
962
- other alteration of the Database or of a Substantial part of the
963
- Contents. This includes, but is not limited to, Extracting or
964
- Re-utilising the whole or a Substantial part of the Contents in a new
965
- Database.
966
-
967
- "Extraction" – Means the permanent or temporary transfer of all or a
968
- Substantial part of the Contents to another medium by any means or in
969
- any form.
970
-
971
- "License" – Means this license agreement and is both a license of rights
972
- such as copyright and Database Rights and an agreement in contract.
973
-
974
- "Licensor" – Means the Person that offers the Database under the terms
975
- of this License.
976
-
977
- "Person" – Means a natural or legal person or a body of persons
978
- corporate or incorporate.
979
-
980
- "Produced Work" – a work (such as an image, audiovisual material, text,
981
- or sounds) resulting from using the whole or a Substantial part of the
982
- Contents (via a search or other query) from this Database, a Derivative
983
- Database, or this Database as part of a Collective Database.
984
-
985
- "Publicly" – means to Persons other than You or under Your control by
986
- either more than 50% ownership or by the power to direct their
987
- activities (such as contracting with an independent consultant).
988
-
989
- "Re-utilisation" – means any form of making available to the public all
990
- or a Substantial part of the Contents by the distribution of copies, by
991
- renting, by online or other forms of transmission.
992
-
993
- "Substantial" – Means substantial in terms of quantity or quality or a
994
- combination of both. The repeated and systematic Extraction or
995
- Re-utilisation of insubstantial parts of the Contents may amount to the
996
- Extraction or Re-utilisation of a Substantial part of the Contents.
997
-
998
- "Use" – As a verb, means doing any act that is restricted by copyright
999
- or Database Rights whether in the original medium or any other; and
1000
- includes without limitation distributing, copying, publicly performing,
1001
- publicly displaying, and preparing derivative works of the Database, as
1002
- well as modifying the Database as may be technically necessary to use it
1003
- in a different mode or format.
1004
-
1005
- "You" – Means a Person exercising rights under this License who has not
1006
- previously violated the terms of this License with respect to the
1007
- Database, or who has received express permission from the Licensor to
1008
- exercise rights under this License despite a previous violation.
1009
-
1010
- Words in the singular include the plural and vice versa.
1011
-
1012
- ### 2.0 What this License covers
1013
-
1014
- 2.1. Legal effect of this document. This License is:
1015
-
1016
- a. A license of applicable copyright and neighbouring rights;
1017
-
1018
- b. A license of the Database Right; and
1019
-
1020
- c. An agreement in contract between You and the Licensor.
1021
-
1022
- 2.2 Legal rights covered. This License covers the legal rights in the
1023
- Database, including:
1024
-
1025
- a. Copyright. Any copyright or neighbouring rights in the Database.
1026
- The copyright licensed includes any individual elements of the
1027
- Database, but does not cover the copyright over the Contents
1028
- independent of this Database. See Section 2.4 for details. Copyright
1029
- law varies between jurisdictions, but is likely to cover: the Database
1030
- model or schema, which is the structure, arrangement, and organisation
1031
- of the Database, and can also include the Database tables and table
1032
- indexes; the data entry and output sheets; and the Field names of
1033
- Contents stored in the Database;
1034
-
1035
- b. Database Rights. Database Rights only extend to the Extraction and
1036
- Re-utilisation of the whole or a Substantial part of the Contents.
1037
- Database Rights can apply even when there is no copyright over the
1038
- Database. Database Rights can also apply when the Contents are removed
1039
- from the Database and are selected and arranged in a way that would
1040
- not infringe any applicable copyright; and
1041
-
1042
- c. Contract. This is an agreement between You and the Licensor for
1043
- access to the Database. In return you agree to certain conditions of
1044
- use on this access as outlined in this License.
1045
-
1046
- 2.3 Rights not covered.
1047
-
1048
- a. This License does not apply to computer programs used in the making
1049
- or operation of the Database;
1050
-
1051
- b. This License does not cover any patents over the Contents or the
1052
- Database; and
1053
-
1054
- c. This License does not cover any trademarks associated with the
1055
- Database.
1056
-
1057
- 2.4 Relationship to Contents in the Database. The individual items of
1058
- the Contents contained in this Database may be covered by other rights,
1059
- including copyright, patent, data protection, privacy, or personality
1060
- rights, and this License does not cover any rights (other than Database
1061
- Rights or in contract) in individual Contents contained in the Database.
1062
- For example, if used on a Database of images (the Contents), this
1063
- License would not apply to copyright over individual images, which could
1064
- have their own separate licenses, or one single license covering all of
1065
- the rights over the images.
1066
-
1067
- ### 3.0 Rights granted
1068
-
1069
- 3.1 Subject to the terms and conditions of this License, the Licensor
1070
- grants to You a worldwide, royalty-free, non-exclusive, terminable (but
1071
- only under Section 9) license to Use the Database for the duration of
1072
- any applicable copyright and Database Rights. These rights explicitly
1073
- include commercial use, and do not exclude any field of endeavour. To
1074
- the extent possible in the relevant jurisdiction, these rights may be
1075
- exercised in all media and formats whether now known or created in the
1076
- future.
1077
-
1078
- The rights granted cover, for example:
1079
-
1080
- a. Extraction and Re-utilisation of the whole or a Substantial part of
1081
- the Contents;
1082
-
1083
- b. Creation of Derivative Databases;
1084
-
1085
- c. Creation of Collective Databases;
1086
-
1087
- d. Creation of temporary or permanent reproductions by any means and
1088
- in any form, in whole or in part, including of any Derivative
1089
- Databases or as a part of Collective Databases; and
1090
-
1091
- e. Distribution, communication, display, lending, making available, or
1092
- performance to the public by any means and in any form, in whole or in
1093
- part, including of any Derivative Database or as a part of Collective
1094
- Databases.
1095
-
1096
- 3.2 Compulsory license schemes. For the avoidance of doubt:
1097
-
1098
- a. Non-waivable compulsory license schemes. In those jurisdictions in
1099
- which the right to collect royalties through any statutory or
1100
- compulsory licensing scheme cannot be waived, the Licensor reserves
1101
- the exclusive right to collect such royalties for any exercise by You
1102
- of the rights granted under this License;
1103
-
1104
- b. Waivable compulsory license schemes. In those jurisdictions in
1105
- which the right to collect royalties through any statutory or
1106
- compulsory licensing scheme can be waived, the Licensor waives the
1107
- exclusive right to collect such royalties for any exercise by You of
1108
- the rights granted under this License; and,
1109
-
1110
- c. Voluntary license schemes. The Licensor waives the right to collect
1111
- royalties, whether individually or, in the event that the Licensor is
1112
- a member of a collecting society that administers voluntary licensing
1113
- schemes, via that society, from any exercise by You of the rights
1114
- granted under this License.
1115
-
1116
- 3.3 The right to release the Database under different terms, or to stop
1117
- distributing or making available the Database, is reserved. Note that
1118
- this Database may be multiple-licensed, and so You may have the choice
1119
- of using alternative licenses for this Database. Subject to Section
1120
- 10.4, all other rights not expressly granted by Licensor are reserved.
1121
-
1122
- ### 4.0 Conditions of Use
1123
-
1124
- 4.1 The rights granted in Section 3 above are expressly made subject to
1125
- Your complying with the following conditions of use. These are important
1126
- conditions of this License, and if You fail to follow them, You will be
1127
- in material breach of its terms.
1128
-
1129
- 4.2 Notices. If You Publicly Convey this Database, any Derivative
1130
- Database, or the Database as part of a Collective Database, then You
1131
- must:
1132
-
1133
- a. Do so only under the terms of this License or another license
1134
- permitted under Section 4.4;
1135
-
1136
- b. Include a copy of this License (or, as applicable, a license
1137
- permitted under Section 4.4) or its Uniform Resource Identifier (URI)
1138
- with the Database or Derivative Database, including both in the
1139
- Database or Derivative Database and in any relevant documentation; and
1140
-
1141
- c. Keep intact any copyright or Database Right notices and notices
1142
- that refer to this License.
1143
-
1144
- d. If it is not possible to put the required notices in a particular
1145
- file due to its structure, then You must include the notices in a
1146
- location (such as a relevant directory) where users would be likely to
1147
- look for it.
1148
-
1149
- 4.3 Notice for using output (Contents). Creating and Using a Produced
1150
- Work does not require the notice in Section 4.2. However, if you
1151
- Publicly Use a Produced Work, You must include a notice associated with
1152
- the Produced Work reasonably calculated to make any Person that uses,
1153
- views, accesses, interacts with, or is otherwise exposed to the Produced
1154
- Work aware that Content was obtained from the Database, Derivative
1155
- Database, or the Database as part of a Collective Database, and that it
1156
- is available under this License.
1157
-
1158
- a. Example notice. The following text will satisfy notice under
1159
- Section 4.3:
1160
-
1161
- Contains information from DATABASE NAME, which is made available
1162
- here under the Open Database License (ODbL).
1163
-
1164
- DATABASE NAME should be replaced with the name of the Database and a
1165
- hyperlink to the URI of the Database. "Open Database License" should
1166
- contain a hyperlink to the URI of the text of this License. If
1167
- hyperlinks are not possible, You should include the plain text of the
1168
- required URI's with the above notice.
1169
-
1170
- 4.4 Share alike.
1171
-
1172
- a. Any Derivative Database that You Publicly Use must be only under
1173
- the terms of:
1174
-
1175
- i. This License;
1176
-
1177
- ii. A later version of this License similar in spirit to this
1178
- License; or
1179
-
1180
- iii. A compatible license.
1181
-
1182
- If You license the Derivative Database under one of the licenses
1183
- mentioned in (iii), You must comply with the terms of that license.
1184
-
1185
- b. For the avoidance of doubt, Extraction or Re-utilisation of the
1186
- whole or a Substantial part of the Contents into a new database is a
1187
- Derivative Database and must comply with Section 4.4.
1188
-
1189
- c. Derivative Databases and Produced Works. A Derivative Database is
1190
- Publicly Used and so must comply with Section 4.4. if a Produced Work
1191
- created from the Derivative Database is Publicly Used.
1192
-
1193
- d. Share Alike and additional Contents. For the avoidance of doubt,
1194
- You must not add Contents to Derivative Databases under Section 4.4 a
1195
- that are incompatible with the rights granted under this License.
1196
-
1197
- e. Compatible licenses. Licensors may authorise a proxy to determine
1198
- compatible licenses under Section 4.4 a iii. If they do so, the
1199
- authorised proxy's public statement of acceptance of a compatible
1200
- license grants You permission to use the compatible license.
1201
-
1202
-
1203
- 4.5 Limits of Share Alike. The requirements of Section 4.4 do not apply
1204
- in the following:
1205
-
1206
- a. For the avoidance of doubt, You are not required to license
1207
- Collective Databases under this License if You incorporate this
1208
- Database or a Derivative Database in the collection, but this License
1209
- still applies to this Database or a Derivative Database as a part of
1210
- the Collective Database;
1211
-
1212
- b. Using this Database, a Derivative Database, or this Database as
1213
- part of a Collective Database to create a Produced Work does not
1214
- create a Derivative Database for purposes of Section 4.4; and
1215
-
1216
- c. Use of a Derivative Database internally within an organisation is
1217
- not to the public and therefore does not fall under the requirements
1218
- of Section 4.4.
1219
-
1220
- 4.6 Access to Derivative Databases. If You Publicly Use a Derivative
1221
- Database or a Produced Work from a Derivative Database, You must also
1222
- offer to recipients of the Derivative Database or Produced Work a copy
1223
- in a machine readable form of:
1224
-
1225
- a. The entire Derivative Database; or
1226
-
1227
- b. A file containing all of the alterations made to the Database or
1228
- the method of making the alterations to the Database (such as an
1229
- algorithm), including any additional Contents, that make up all the
1230
- differences between the Database and the Derivative Database.
1231
-
1232
- The Derivative Database (under a.) or alteration file (under b.) must be
1233
- available at no more than a reasonable production cost for physical
1234
- distributions and free of charge if distributed over the internet.
1235
-
1236
- 4.7 Technological measures and additional terms
1237
-
1238
- a. This License does not allow You to impose (except subject to
1239
- Section 4.7 b.) any terms or any technological measures on the
1240
- Database, a Derivative Database, or the whole or a Substantial part of
1241
- the Contents that alter or restrict the terms of this License, or any
1242
- rights granted under it, or have the effect or intent of restricting
1243
- the ability of any person to exercise those rights.
1244
-
1245
- b. Parallel distribution. You may impose terms or technological
1246
- measures on the Database, a Derivative Database, or the whole or a
1247
- Substantial part of the Contents (a "Restricted Database") in
1248
- contravention of Section 4.74 a. only if You also make a copy of the
1249
- Database or a Derivative Database available to the recipient of the
1250
- Restricted Database:
1251
-
1252
- i. That is available without additional fee;
1253
-
1254
- ii. That is available in a medium that does not alter or restrict
1255
- the terms of this License, or any rights granted under it, or have
1256
- the effect or intent of restricting the ability of any person to
1257
- exercise those rights (an "Unrestricted Database"); and
1258
-
1259
- iii. The Unrestricted Database is at least as accessible to the
1260
- recipient as a practical matter as the Restricted Database.
1261
-
1262
- c. For the avoidance of doubt, You may place this Database or a
1263
- Derivative Database in an authenticated environment, behind a
1264
- password, or within a similar access control scheme provided that You
1265
- do not alter or restrict the terms of this License or any rights
1266
- granted under it or have the effect or intent of restricting the
1267
- ability of any person to exercise those rights.
1268
-
1269
- 4.8 Licensing of others. You may not sublicense the Database. Each time
1270
- You communicate the Database, the whole or Substantial part of the
1271
- Contents, or any Derivative Database to anyone else in any way, the
1272
- Licensor offers to the recipient a license to the Database on the same
1273
- terms and conditions as this License. You are not responsible for
1274
- enforcing compliance by third parties with this License, but You may
1275
- enforce any rights that You have over a Derivative Database. You are
1276
- solely responsible for any modifications of a Derivative Database made
1277
- by You or another Person at Your direction. You may not impose any
1278
- further restrictions on the exercise of the rights granted or affirmed
1279
- under this License.
1280
-
1281
- ### 5.0 Moral rights
1282
-
1283
- 5.1 Moral rights. This section covers moral rights, including any rights
1284
- to be identified as the author of the Database or to object to treatment
1285
- that would otherwise prejudice the author's honour and reputation, or
1286
- any other derogatory treatment:
1287
-
1288
- a. For jurisdictions allowing waiver of moral rights, Licensor waives
1289
- all moral rights that Licensor may have in the Database to the fullest
1290
- extent possible by the law of the relevant jurisdiction under Section
1291
- 10.4;
1292
-
1293
- b. If waiver of moral rights under Section 5.1 a in the relevant
1294
- jurisdiction is not possible, Licensor agrees not to assert any moral
1295
- rights over the Database and waives all claims in moral rights to the
1296
- fullest extent possible by the law of the relevant jurisdiction under
1297
- Section 10.4; and
1298
-
1299
- c. For jurisdictions not allowing waiver or an agreement not to assert
1300
- moral rights under Section 5.1 a and b, the author may retain their
1301
- moral rights over certain aspects of the Database.
1302
-
1303
- Please note that some jurisdictions do not allow for the waiver of moral
1304
- rights, and so moral rights may still subsist over the Database in some
1305
- jurisdictions.
1306
-
1307
- ### 6.0 Fair dealing, Database exceptions, and other rights not affected
1308
-
1309
- 6.1 This License does not affect any rights that You or anyone else may
1310
- independently have under any applicable law to make any use of this
1311
- Database, including without limitation:
1312
-
1313
- a. Exceptions to the Database Right including: Extraction of Contents
1314
- from non-electronic Databases for private purposes, Extraction for
1315
- purposes of illustration for teaching or scientific research, and
1316
- Extraction or Re-utilisation for public security or an administrative
1317
- or judicial procedure.
1318
-
1319
- b. Fair dealing, fair use, or any other legally recognised limitation
1320
- or exception to infringement of copyright or other applicable laws.
1321
-
1322
- 6.2 This License does not affect any rights of lawful users to Extract
1323
- and Re-utilise insubstantial parts of the Contents, evaluated
1324
- quantitatively or qualitatively, for any purposes whatsoever, including
1325
- creating a Derivative Database (subject to other rights over the
1326
- Contents, see Section 2.4). The repeated and systematic Extraction or
1327
- Re-utilisation of insubstantial parts of the Contents may however amount
1328
- to the Extraction or Re-utilisation of a Substantial part of the
1329
- Contents.
1330
-
1331
- ### 7.0 Warranties and Disclaimer
1332
-
1333
- 7.1 The Database is licensed by the Licensor "as is" and without any
1334
- warranty of any kind, either express, implied, or arising by statute,
1335
- custom, course of dealing, or trade usage. Licensor specifically
1336
- disclaims any and all implied warranties or conditions of title,
1337
- non-infringement, accuracy or completeness, the presence or absence of
1338
- errors, fitness for a particular purpose, merchantability, or otherwise.
1339
- Some jurisdictions do not allow the exclusion of implied warranties, so
1340
- this exclusion may not apply to You.
1341
-
1342
- ### 8.0 Limitation of liability
1343
-
1344
- 8.1 Subject to any liability that may not be excluded or limited by law,
1345
- the Licensor is not liable for, and expressly excludes, all liability
1346
- for loss or damage however and whenever caused to anyone by any use
1347
- under this License, whether by You or by anyone else, and whether caused
1348
- by any fault on the part of the Licensor or not. This exclusion of
1349
- liability includes, but is not limited to, any special, incidental,
1350
- consequential, punitive, or exemplary damages such as loss of revenue,
1351
- data, anticipated profits, and lost business. This exclusion applies
1352
- even if the Licensor has been advised of the possibility of such
1353
- damages.
1354
-
1355
- 8.2 If liability may not be excluded by law, it is limited to actual and
1356
- direct financial loss to the extent it is caused by proved negligence on
1357
- the part of the Licensor.
1358
-
1359
- ### 9.0 Termination of Your rights under this License
1360
-
1361
- 9.1 Any breach by You of the terms and conditions of this License
1362
- automatically terminates this License with immediate effect and without
1363
- notice to You. For the avoidance of doubt, Persons who have received the
1364
- Database, the whole or a Substantial part of the Contents, Derivative
1365
- Databases, or the Database as part of a Collective Database from You
1366
- under this License will not have their licenses terminated provided
1367
- their use is in full compliance with this License or a license granted
1368
- under Section 4.8 of this License. Sections 1, 2, 7, 8, 9 and 10 will
1369
- survive any termination of this License.
1370
-
1371
- 9.2 If You are not in breach of the terms of this License, the Licensor
1372
- will not terminate Your rights under it.
1373
-
1374
- 9.3 Unless terminated under Section 9.1, this License is granted to You
1375
- for the duration of applicable rights in the Database.
1376
-
1377
- 9.4 Reinstatement of rights. If you cease any breach of the terms and
1378
- conditions of this License, then your full rights under this License
1379
- will be reinstated:
1380
-
1381
- a. Provisionally and subject to permanent termination until the 60th
1382
- day after cessation of breach;
1383
-
1384
- b. Permanently on the 60th day after cessation of breach unless
1385
- otherwise reasonably notified by the Licensor; or
1386
-
1387
- c. Permanently if reasonably notified by the Licensor of the
1388
- violation, this is the first time You have received notice of
1389
- violation of this License from the Licensor, and You cure the
1390
- violation prior to 30 days after your receipt of the notice.
1391
-
1392
- Persons subject to permanent termination of rights are not eligible to
1393
- be a recipient and receive a license under Section 4.8.
1394
-
1395
- 9.5 Notwithstanding the above, Licensor reserves the right to release
1396
- the Database under different license terms or to stop distributing or
1397
- making available the Database. Releasing the Database under different
1398
- license terms or stopping the distribution of the Database will not
1399
- withdraw this License (or any other license that has been, or is
1400
- required to be, granted under the terms of this License), and this
1401
- License will continue in full force and effect unless terminated as
1402
- stated above.
1403
-
1404
- ### 10.0 General
1405
-
1406
- 10.1 If any provision of this License is held to be invalid or
1407
- unenforceable, that must not affect the validity or enforceability of
1408
- the remainder of the terms and conditions of this License and each
1409
- remaining provision of this License shall be valid and enforced to the
1410
- fullest extent permitted by law.
1411
-
1412
- 10.2 This License is the entire agreement between the parties with
1413
- respect to the rights granted here over the Database. It replaces any
1414
- earlier understandings, agreements or representations with respect to
1415
- the Database.
1416
-
1417
- 10.3 If You are in breach of the terms of this License, You will not be
1418
- entitled to rely on the terms of this License or to complain of any
1419
- breach by the Licensor.
1420
-
1421
- 10.4 Choice of law. This License takes effect in and will be governed by
1422
- the laws of the relevant jurisdiction in which the License terms are
1423
- sought to be enforced. If the standard suite of rights granted under
1424
- applicable copyright law and Database Rights in the relevant
1425
- jurisdiction includes additional rights not granted under this License,
1426
- these additional rights are granted in this License in order to meet the
1427
- terms of this License.```
1428
-
1429
-
1430
-
1431
-
1432
  # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
1433
 
1434
  * Author: Explosion
 
878
 
879
 
880
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
881
  # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
882
 
883
  * Author: Explosion
README.md CHANGED
@@ -14,61 +14,76 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.8219461698
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8270833333
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.8245067497
 
 
 
 
 
 
 
24
  - task:
25
  name: POS
26
  type: token-classification
27
  metrics:
28
- - name: POS Accuracy
29
  type: accuracy
30
- value: 0.9650363196
31
  - task:
32
- name: SENTER
33
  type: token-classification
34
  metrics:
35
- - name: SENTER Precision
36
- type: precision
37
- value: 0.9142335766
38
- - name: SENTER Recall
39
- type: recall
40
- value: 0.8882978723
41
- - name: SENTER F Score
42
- type: f_score
43
- value: 0.9010791367
44
  - task:
45
- name: UNLABELED_DEPENDENCIES
46
  type: token-classification
47
  metrics:
48
- - name: Unlabeled Dependencies Accuracy
49
  type: accuracy
50
- value: 0.8225959658
 
 
 
 
 
 
 
51
  - task:
52
  name: LABELED_DEPENDENCIES
53
  type: token-classification
54
  metrics:
55
- - name: Labeled Dependencies Accuracy
56
- type: accuracy
57
- value: 0.8225959658
 
 
 
 
 
 
 
58
  ---
59
  ### Details: https://spacy.io/models/da#da_core_news_lg
60
 
61
- Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.
62
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `da_core_news_lg` |
66
- | **Version** | `3.2.0` |
67
- | **spaCy** | `>=3.2.0,<3.3.0` |
68
- | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
- | **Components** | `tok2vec`, `morphologizer`, `parser`, `senter`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 500000 keys, 500000 unique vectors (300 dimensions) |
71
- | **Sources** | [UD Danish DDT v2.8](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Lemmatization Lists](https://github.com/michmech/lemmatization-lists/) (Michal Měchura)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
  | **License** | `CC BY-SA 4.0` |
73
  | **Author** | [Explosion](https://explosion.ai) |
74
 
@@ -76,13 +91,12 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
76
 
77
  <details>
78
 
79
- <summary>View label scheme (195 labels for 4 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
84
  | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
85
- | **`senter`** | `I`, `S` |
86
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
87
 
88
  </details>
@@ -95,18 +109,18 @@ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, s
95
  | `TOKEN_P` | 99.78 |
96
  | `TOKEN_R` | 99.75 |
97
  | `TOKEN_F` | 99.76 |
98
- | `POS_ACC` | 96.50 |
99
- | `MORPH_ACC` | 95.72 |
100
- | `MORPH_MICRO_P` | 97.22 |
101
- | `MORPH_MICRO_R` | 96.68 |
102
- | `MORPH_MICRO_F` | 96.95 |
103
- | `SENTS_P` | 91.42 |
104
- | `SENTS_R` | 88.83 |
105
- | `SENTS_F` | 90.11 |
106
- | `DEP_UAS` | 82.26 |
107
- | `DEP_LAS` | 78.33 |
108
- | `TAG_ACC` | 96.50 |
109
- | `LEMMA_ACC` | 84.91 |
110
- | `ENTS_P` | 82.19 |
111
- | `ENTS_R` | 82.71 |
112
- | `ENTS_F` | 82.45 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.8183716075
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.8166666667
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8175182482
24
+ - task:
25
+ name: TAG
26
+ type: token-classification
27
+ metrics:
28
+ - name: TAG (XPOS) Accuracy
29
+ type: accuracy
30
+ value: 0.9633898305
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
+ - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9633898305
38
  - task:
39
+ name: MORPH
40
  type: token-classification
41
  metrics:
42
+ - name: Morph (UFeats) Accuracy
43
+ type: accuracy
44
+ value: 0.9568038741
 
 
 
 
 
 
45
  - task:
46
+ name: LEMMA
47
  type: token-classification
48
  metrics:
49
+ - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9516707022
52
+ - task:
53
+ name: UNLABELED_DEPENDENCIES
54
+ type: token-classification
55
+ metrics:
56
+ - name: Unlabeled Attachment Score (UAS)
57
+ type: f_score
58
+ value: 0.8195787003
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
+ - name: Labeled Attachment Score (LAS)
64
+ type: f_score
65
+ value: 0.7807576266
66
+ - task:
67
+ name: SENTS
68
+ type: token-classification
69
+ metrics:
70
+ - name: Sentences F-Score
71
+ type: f_score
72
+ value: 0.9055258467
73
  ---
74
  ### Details: https://spacy.io/models/da#da_core_news_lg
75
 
76
+ Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner, attribute_ruler.
77
 
78
  | Feature | Description |
79
  | --- | --- |
80
  | **Name** | `da_core_news_lg` |
81
+ | **Version** | `3.3.0` |
82
+ | **spaCy** | `>=3.3.0.dev0,<3.4.0` |
83
+ | **Default Pipeline** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `attribute_ruler`, `ner` |
84
+ | **Components** | `tok2vec`, `morphologizer`, `parser`, `lemmatizer`, `senter`, `attribute_ruler`, `ner` |
85
  | **Vectors** | 500000 keys, 500000 unique vectors (300 dimensions) |
86
+ | **Sources** | [UD Danish DDT v2.8](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
87
  | **License** | `CC BY-SA 4.0` |
88
  | **Author** | [Explosion](https://explosion.ai) |
89
 
 
91
 
92
  <details>
93
 
94
+ <summary>View label scheme (193 labels for 3 components)</summary>
95
 
96
  | Component | Labels |
97
  | --- | --- |
98
  | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
99
  | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
 
100
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
101
 
102
  </details>
 
109
  | `TOKEN_P` | 99.78 |
110
  | `TOKEN_R` | 99.75 |
111
  | `TOKEN_F` | 99.76 |
112
+ | `POS_ACC` | 96.34 |
113
+ | `MORPH_ACC` | 95.68 |
114
+ | `MORPH_MICRO_P` | 97.27 |
115
+ | `MORPH_MICRO_R` | 96.56 |
116
+ | `MORPH_MICRO_F` | 96.91 |
117
+ | `SENTS_P` | 91.04 |
118
+ | `SENTS_R` | 90.07 |
119
+ | `SENTS_F` | 90.55 |
120
+ | `DEP_UAS` | 81.96 |
121
+ | `DEP_LAS` | 78.08 |
122
+ | `LEMMA_ACC` | 95.17 |
123
+ | `TAG_ACC` | 96.34 |
124
+ | `ENTS_P` | 81.84 |
125
+ | `ENTS_R` | 81.67 |
126
+ | `ENTS_F` | 81.75 |
accuracy.json CHANGED
@@ -3,81 +3,81 @@
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
- "pos_acc": 0.9650363196,
7
- "morph_acc": 0.9571912833,
8
- "morph_micro_p": 0.9722335025,
9
- "morph_micro_r": 0.9667861289,
10
- "morph_micro_f": 0.969502164,
11
  "morph_per_feat": {
12
  "Mood": {
13
- "p": 0.9781160799,
14
- "r": 0.9799809342,
15
- "f": 0.979047619
16
  },
17
  "Tense": {
18
- "p": 0.9765506808,
19
- "r": 0.9721385542,
20
- "f": 0.9743396226
21
  },
22
  "VerbForm": {
23
- "p": 0.9697156984,
24
- "r": 0.9602203182,
25
- "f": 0.9649446494
26
  },
27
  "Voice": {
28
- "p": 0.9797752809,
29
- "r": 0.9775784753,
30
- "f": 0.9786756453
31
  },
32
  "Definite": {
33
- "p": 0.9666401906,
34
- "r": 0.9616752272,
35
- "f": 0.9641513171
36
  },
37
  "Gender": {
38
- "p": 0.9588903743,
39
- "r": 0.9534729146,
40
- "f": 0.956173971
41
  },
42
  "Number": {
43
- "p": 0.9666403993,
44
- "r": 0.9598330725,
45
- "f": 0.9632247088
46
  },
47
  "AdpType": {
48
- "p": 0.9991071429,
49
- "r": 0.9893899204,
50
- "f": 0.994224789
51
  },
52
  "PartType": {
53
- "p": 1.0,
54
  "r": 1.0,
55
- "f": 1.0
56
  },
57
  "Case": {
58
- "p": 0.9792332268,
59
- "r": 0.9684044234,
60
- "f": 0.9737887212
61
  },
62
  "Person": {
63
- "p": 0.9805996473,
64
- "r": 0.9875666075,
65
- "f": 0.9840707965
66
  },
67
  "PronType": {
68
- "p": 0.9860082305,
69
- "r": 0.9851973684,
70
- "f": 0.9856026327
71
  },
72
  "NumType": {
73
- "p": 0.9731543624,
74
  "r": 0.9602649007,
75
- "f": 0.9666666667
76
  },
77
  "Degree": {
78
- "p": 0.9587878788,
79
- "r": 0.9530120482,
80
- "f": 0.9558912387
81
  },
82
  "Reflex": {
83
  "p": 1.0,
@@ -85,24 +85,24 @@
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
- "p": 0.9885057471,
89
  "r": 1.0,
90
- "f": 0.9942196532
91
  },
92
  "Poss": {
93
- "p": 1.0,
94
  "r": 1.0,
95
- "f": 1.0
96
  },
97
  "Foreign": {
98
- "p": 0.6666666667,
99
- "r": 0.4,
100
- "f": 0.5
101
  },
102
  "Abbr": {
103
- "p": 1.0,
104
- "r": 0.2,
105
- "f": 0.3333333333
106
  },
107
  "Style": {
108
  "p": 1.0,
@@ -110,146 +110,151 @@
110
  "f": 1.0
111
  },
112
  "Polite": {
113
- "p": 1.0,
114
- "r": 0.5,
115
- "f": 0.6666666667
116
  }
117
  },
118
- "sents_p": 0.9142335766,
119
- "sents_r": 0.8882978723,
120
- "sents_f": 0.9010791367,
121
- "dep_uas": 0.8225959658,
122
- "dep_las": 0.7833277461,
123
  "dep_las_per_type": {
124
  "advmod": {
125
- "p": 0.6842105263,
126
- "r": 0.697740113,
127
- "f": 0.6909090909
128
  },
129
  "root": {
130
- "p": 0.8513761468,
131
- "r": 0.8226950355,
132
- "f": 0.8367899008
133
  },
134
  "nsubj": {
135
- "p": 0.8508583691,
136
- "r": 0.8364978903,
137
- "f": 0.8436170213
138
  },
139
  "case": {
140
- "p": 0.8953603159,
141
- "r": 0.8944773176,
142
- "f": 0.8949185989
143
  },
144
  "obl": {
145
- "p": 0.71973466,
146
  "r": 0.6739130435,
147
- "f": 0.6960705694
148
  },
149
  "cc": {
150
- "p": 0.7885714286,
151
- "r": 0.8023255814,
152
- "f": 0.795389049
153
  },
154
  "conj": {
155
- "p": 0.647696477,
156
- "r": 0.6373333333,
157
- "f": 0.6424731183
158
  },
159
  "obj": {
160
- "p": 0.8161764706,
161
- "r": 0.8621359223,
162
- "f": 0.8385269122
163
  },
164
  "aux": {
165
- "p": 0.8742690058,
166
- "r": 0.8717201166,
167
- "f": 0.8729927007
168
  },
169
  "acl:relcl": {
170
- "p": 0.5773195876,
171
- "r": 0.6054054054,
172
- "f": 0.5910290237
173
  },
174
  "advmod:lmod": {
175
- "p": 0.6714285714,
176
- "r": 0.7014925373,
177
- "f": 0.6861313869
178
  },
179
  "det": {
180
- "p": 0.9253731343,
181
- "r": 0.9192751236,
182
- "f": 0.9223140496
183
  },
184
  "amod": {
185
- "p": 0.8313458262,
186
- "r": 0.8327645051,
187
- "f": 0.832054561
188
  },
189
  "nmod:poss": {
190
- "p": 0.6326530612,
191
- "r": 0.6138613861,
192
- "f": 0.6231155779
193
  },
194
  "ccomp": {
195
- "p": 0.676056338,
196
- "r": 0.7741935484,
197
- "f": 0.7218045113
198
  },
199
  "nummod": {
200
- "p": 0.8620689655,
201
- "r": 0.8333333333,
202
- "f": 0.8474576271
203
  },
204
  "flat": {
205
- "p": 0.8,
206
- "r": 0.8741721854,
207
- "f": 0.835443038
208
  },
209
  "compound:prt": {
210
- "p": 0.4,
211
- "r": 0.3414634146,
212
- "f": 0.3684210526
213
  },
214
  "advcl": {
215
- "p": 0.6181818182,
216
- "r": 0.5862068966,
217
- "f": 0.6017699115
218
  },
219
  "mark": {
220
- "p": 0.8782051282,
221
- "r": 0.8439425051,
222
- "f": 0.8607329843
223
  },
224
  "cop": {
225
- "p": 0.7842105263,
226
- "r": 0.8514285714,
227
- "f": 0.8164383562
228
  },
229
  "dep": {
230
- "p": 0.1707317073,
231
- "r": 0.2641509434,
232
- "f": 0.2074074074
233
  },
234
  "nmod": {
235
- "p": 0.6296992481,
236
- "r": 0.654296875,
237
- "f": 0.6417624521
238
  },
239
  "iobj": {
240
- "p": 0.8,
241
- "r": 0.5454545455,
242
- "f": 0.6486486486
243
  },
244
  "xcomp": {
245
- "p": 0.5333333333,
246
- "r": 0.406779661,
247
- "f": 0.4615384615
 
 
 
 
 
248
  },
249
  "list": {
250
- "p": 0.5454545455,
251
  "r": 0.3333333333,
252
- "f": 0.4137931034
253
  },
254
  "vocative": {
255
  "p": 0.0,
@@ -257,62 +262,57 @@
257
  "f": 0.0
258
  },
259
  "fixed": {
260
- "p": 0.8648648649,
261
- "r": 0.7804878049,
262
- "f": 0.8205128205
263
  },
264
- "expl": {
265
- "p": 0.8,
266
- "r": 0.8235294118,
267
- "f": 0.8115942029
268
  },
269
- "appos": {
270
- "p": 0.4054054054,
271
- "r": 0.4545454545,
272
- "f": 0.4285714286
273
  },
274
  "obl:tmod": {
275
- "p": 0.5555555556,
276
- "r": 0.2777777778,
277
- "f": 0.3703703704
278
  },
279
  "discourse": {
280
  "p": 0.0,
281
  "r": 0.0,
282
  "f": 0.0
283
- },
284
- "obl:lmod": {
285
- "p": 0.0,
286
- "r": 0.0,
287
- "f": 0.0
288
  }
289
  },
290
- "tag_acc": 0.9650363196,
291
- "lemma_acc": 0.8491041162,
292
- "ents_p": 0.8219461698,
293
- "ents_r": 0.8270833333,
294
- "ents_f": 0.8245067497,
295
  "ents_per_type": {
296
  "PER": {
297
- "p": 0.9171974522,
298
- "r": 0.8674698795,
299
- "f": 0.8916408669
300
  },
301
  "ORG": {
302
- "p": 0.7840909091,
303
- "r": 0.7666666667,
304
- "f": 0.7752808989
305
  },
306
  "MISC": {
307
- "p": 0.6776859504,
308
- "r": 0.7256637168,
309
- "f": 0.7008547009
310
  },
311
  "LOC": {
312
- "p": 0.8717948718,
313
- "r": 0.9189189189,
314
- "f": 0.8947368421
315
  }
316
  },
317
- "speed": 8840.2170640394
318
  }
 
3
  "token_p": 0.9977732598,
4
  "token_r": 0.9974835463,
5
  "token_f": 0.997628382,
6
+ "pos_acc": 0.9633898305,
7
+ "morph_acc": 0.9568038741,
8
+ "morph_micro_p": 0.9727434528,
9
+ "morph_micro_r": 0.9655746807,
10
+ "morph_micro_f": 0.9691458101,
11
  "morph_per_feat": {
12
  "Mood": {
13
+ "p": 0.9799043062,
14
+ "r": 0.9761677788,
15
+ "f": 0.9780324737
16
  },
17
  "Tense": {
18
+ "p": 0.9772727273,
19
+ "r": 0.9713855422,
20
+ "f": 0.9743202417
21
  },
22
  "VerbForm": {
23
+ "p": 0.9686153846,
24
+ "r": 0.9632802938,
25
+ "f": 0.9659404725
26
  },
27
  "Voice": {
28
+ "p": 0.9798206278,
29
+ "r": 0.9798206278,
30
+ "f": 0.9798206278
31
  },
32
  "Definite": {
33
+ "p": 0.968812475,
34
+ "r": 0.9573291189,
35
+ "f": 0.963036566
36
  },
37
  "Gender": {
38
+ "p": 0.9597720416,
39
+ "r": 0.9514788966,
40
+ "f": 0.9556074766
41
  },
42
  "Number": {
43
+ "p": 0.9683961022,
44
+ "r": 0.9590505999,
45
+ "f": 0.9637006945
46
  },
47
  "AdpType": {
48
+ "p": 0.9982206406,
49
+ "r": 0.9920424403,
50
+ "f": 0.9951219512
51
  },
52
  "PartType": {
53
+ "p": 0.996763754,
54
  "r": 1.0,
55
+ "f": 0.9983792545
56
  },
57
  "Case": {
58
+ "p": 0.9806451613,
59
+ "r": 0.9605055292,
60
+ "f": 0.9704708699
61
  },
62
  "Person": {
63
+ "p": 0.9804270463,
64
+ "r": 0.9786856128,
65
+ "f": 0.9795555556
66
  },
67
  "PronType": {
68
+ "p": 0.9835390947,
69
+ "r": 0.9827302632,
70
+ "f": 0.9831345125
71
  },
72
  "NumType": {
73
+ "p": 0.9931506849,
74
  "r": 0.9602649007,
75
+ "f": 0.9764309764
76
  },
77
  "Degree": {
78
+ "p": 0.9578313253,
79
+ "r": 0.9578313253,
80
+ "f": 0.9578313253
81
  },
82
  "Reflex": {
83
  "p": 1.0,
 
85
  "f": 1.0
86
  },
87
  "Number[psor]": {
88
+ "p": 0.9772727273,
89
  "r": 1.0,
90
+ "f": 0.9885057471
91
  },
92
  "Poss": {
93
+ "p": 0.9887640449,
94
  "r": 1.0,
95
+ "f": 0.9943502825
96
  },
97
  "Foreign": {
98
+ "p": 0.6,
99
+ "r": 0.3,
100
+ "f": 0.4
101
  },
102
  "Abbr": {
103
+ "p": 0.0,
104
+ "r": 0.0,
105
+ "f": 0.0
106
  },
107
  "Style": {
108
  "p": 1.0,
 
110
  "f": 1.0
111
  },
112
  "Polite": {
113
+ "p": 0.75,
114
+ "r": 0.75,
115
+ "f": 0.75
116
  }
117
  },
118
+ "sents_p": 0.9103942652,
119
+ "sents_r": 0.9007092199,
120
+ "sents_f": 0.9055258467,
121
+ "dep_uas": 0.8195787003,
122
+ "dep_las": 0.7807576266,
123
  "dep_las_per_type": {
124
  "advmod": {
125
+ "p": 0.6955345061,
126
+ "r": 0.7259887006,
127
+ "f": 0.7104353836
128
  },
129
  "root": {
130
+ "p": 0.824686941,
131
+ "r": 0.8173758865,
132
+ "f": 0.821015138
133
  },
134
  "nsubj": {
135
+ "p": 0.8361884368,
136
+ "r": 0.8238396624,
137
+ "f": 0.829968119
138
  },
139
  "case": {
140
+ "p": 0.9003984064,
141
+ "r": 0.8915187377,
142
+ "f": 0.8959365709
143
  },
144
  "obl": {
145
+ "p": 0.7221297837,
146
  "r": 0.6739130435,
147
+ "f": 0.697188755
148
  },
149
  "cc": {
150
+ "p": 0.7630057803,
151
+ "r": 0.7674418605,
152
+ "f": 0.7652173913
153
  },
154
  "conj": {
155
+ "p": 0.6106442577,
156
+ "r": 0.5813333333,
157
+ "f": 0.5956284153
158
  },
159
  "obj": {
160
+ "p": 0.7893772894,
161
+ "r": 0.8368932039,
162
+ "f": 0.8124410933
163
  },
164
  "aux": {
165
+ "p": 0.8764705882,
166
+ "r": 0.8688046647,
167
+ "f": 0.8726207906
168
  },
169
  "acl:relcl": {
170
+ "p": 0.6300578035,
171
+ "r": 0.5891891892,
172
+ "f": 0.6089385475
173
  },
174
  "advmod:lmod": {
175
+ "p": 0.7272727273,
176
+ "r": 0.7164179104,
177
+ "f": 0.7218045113
178
  },
179
  "det": {
180
+ "p": 0.9140495868,
181
+ "r": 0.9110378913,
182
+ "f": 0.9125412541
183
  },
184
  "amod": {
185
+ "p": 0.8080645161,
186
+ "r": 0.8549488055,
187
+ "f": 0.8308457711
188
  },
189
  "nmod:poss": {
190
+ "p": 0.7373737374,
191
+ "r": 0.7227722772,
192
+ "f": 0.73
193
  },
194
  "ccomp": {
195
+ "p": 0.7068965517,
196
+ "r": 0.6612903226,
197
+ "f": 0.6833333333
198
  },
199
  "nummod": {
200
+ "p": 0.8360655738,
201
+ "r": 0.85,
202
+ "f": 0.8429752066
203
  },
204
  "flat": {
205
+ "p": 0.7844311377,
206
+ "r": 0.8675496689,
207
+ "f": 0.8238993711
208
  },
209
  "compound:prt": {
210
+ "p": 0.5,
211
+ "r": 0.2926829268,
212
+ "f": 0.3692307692
213
  },
214
  "advcl": {
215
+ "p": 0.6545454545,
216
+ "r": 0.6206896552,
217
+ "f": 0.6371681416
218
  },
219
  "mark": {
220
+ "p": 0.8781512605,
221
+ "r": 0.8583162218,
222
+ "f": 0.8681204569
223
  },
224
  "cop": {
225
+ "p": 0.8121546961,
226
+ "r": 0.84,
227
+ "f": 0.8258426966
228
  },
229
  "dep": {
230
+ "p": 0.145631068,
231
+ "r": 0.2830188679,
232
+ "f": 0.1923076923
233
  },
234
  "nmod": {
235
+ "p": 0.6549707602,
236
+ "r": 0.65625,
237
+ "f": 0.6556097561
238
  },
239
  "iobj": {
240
+ "p": 0.8125,
241
+ "r": 0.5909090909,
242
+ "f": 0.6842105263
243
  },
244
  "xcomp": {
245
+ "p": 0.4772727273,
246
+ "r": 0.3559322034,
247
+ "f": 0.4077669903
248
+ },
249
+ "appos": {
250
+ "p": 0.5384615385,
251
+ "r": 0.4242424242,
252
+ "f": 0.4745762712
253
  },
254
  "list": {
255
+ "p": 0.5,
256
  "r": 0.3333333333,
257
+ "f": 0.4
258
  },
259
  "vocative": {
260
  "p": 0.0,
 
262
  "f": 0.0
263
  },
264
  "fixed": {
265
+ "p": 0.8717948718,
266
+ "r": 0.8292682927,
267
+ "f": 0.85
268
  },
269
+ "obl:lmod": {
270
+ "p": 0.0,
271
+ "r": 0.0,
272
+ "f": 0.0
273
  },
274
+ "expl": {
275
+ "p": 0.8529411765,
276
+ "r": 0.8529411765,
277
+ "f": 0.8529411765
278
  },
279
  "obl:tmod": {
280
+ "p": 0.6363636364,
281
+ "r": 0.3888888889,
282
+ "f": 0.4827586207
283
  },
284
  "discourse": {
285
  "p": 0.0,
286
  "r": 0.0,
287
  "f": 0.0
 
 
 
 
 
288
  }
289
  },
290
+ "lemma_acc": 0.9516707022,
291
+ "tag_acc": 0.9633898305,
292
+ "ents_p": 0.8183716075,
293
+ "ents_r": 0.8166666667,
294
+ "ents_f": 0.8175182482,
295
  "ents_per_type": {
296
  "PER": {
297
+ "p": 0.8993710692,
298
+ "r": 0.8614457831,
299
+ "f": 0.88
300
  },
301
  "ORG": {
302
+ "p": 0.7303370787,
303
+ "r": 0.7222222222,
304
+ "f": 0.7262569832
305
  },
306
  "MISC": {
307
+ "p": 0.7288135593,
308
+ "r": 0.7610619469,
309
+ "f": 0.7445887446
310
  },
311
  "LOC": {
312
+ "p": 0.8672566372,
313
+ "r": 0.8828828829,
314
+ "f": 0.875
315
  }
316
  },
317
+ "speed": 10791.2692595094
318
  }
attribute_ruler/patterns CHANGED
Binary files a/attribute_ruler/patterns and b/attribute_ruler/patterns differ
 
config.cfg CHANGED
@@ -10,7 +10,7 @@ seed = 0
10
 
11
  [nlp]
12
  lang = "da"
13
- pipeline = ["tok2vec","morphologizer","parser","senter","attribute_ruler","lemmatizer","ner"]
14
  disabled = ["senter"]
15
  before_creation = null
16
  after_creation = null
@@ -26,11 +26,22 @@ scorer = {"@scorers":"spacy.attribute_ruler_scorer.v1"}
26
  validate = false
27
 
28
  [components.lemmatizer]
29
- factory = "lemmatizer"
30
- mode = "lookup"
31
- model = null
32
  overwrite = false
33
  scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
@@ -39,8 +50,9 @@ overwrite = true
39
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
40
 
41
  [components.morphologizer.model]
42
- @architectures = "spacy.Tagger.v1"
43
  nO = null
 
44
 
45
  [components.morphologizer.model.tok2vec]
46
  @architectures = "spacy.Tok2VecListener.v1"
@@ -70,7 +82,7 @@ nO = null
70
  @architectures = "spacy.MultiHashEmbed.v2"
71
  width = 96
72
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
73
- rows = [5000,2500,2500,2500,100]
74
  include_static_vectors = true
75
 
76
  [components.ner.model.tok2vec.encode]
@@ -108,8 +120,9 @@ overwrite = false
108
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
109
 
110
  [components.senter.model]
111
- @architectures = "spacy.Tagger.v1"
112
  nO = null
 
113
 
114
  [components.senter.model.tok2vec]
115
  @architectures = "spacy.Tok2Vec.v2"
@@ -138,7 +151,7 @@ factory = "tok2vec"
138
  @architectures = "spacy.MultiHashEmbed.v2"
139
  width = ${components.tok2vec.model.encode:width}
140
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
141
- rows = [5000,2500,2500,2500,100]
142
  include_static_vectors = true
143
 
144
  [components.tok2vec.model.encode]
@@ -175,7 +188,7 @@ dropout = 0.1
175
  accumulate_gradient = 1
176
  patience = 5000
177
  max_epochs = 0
178
- max_steps = 0
179
  eval_frequency = 1000
180
  frozen_components = []
181
  before_to_disk = null
@@ -210,17 +223,17 @@ eps = 0.00000001
210
  learn_rate = 0.001
211
 
212
  [training.score_weights]
213
- pos_acc = 0.08
214
- morph_acc = 0.08
215
  morph_per_feat = null
216
  dep_uas = 0.0
217
- dep_las = 0.16
218
  dep_las_per_type = null
219
  sents_p = null
220
  sents_r = null
221
- sents_f = 0.02
222
- lemma_acc = 0.5
223
- ents_f = 0.16
224
  ents_p = 0.0
225
  ents_r = 0.0
226
  ents_per_type = null
@@ -237,6 +250,13 @@ after_init = null
237
 
238
  [initialize.components]
239
 
 
 
 
 
 
 
 
240
  [initialize.components.morphologizer]
241
 
242
  [initialize.components.morphologizer.labels]
 
10
 
11
  [nlp]
12
  lang = "da"
13
+ pipeline = ["tok2vec","morphologizer","parser","lemmatizer","senter","attribute_ruler","ner"]
14
  disabled = ["senter"]
15
  before_creation = null
16
  after_creation = null
 
26
  validate = false
27
 
28
  [components.lemmatizer]
29
+ factory = "trainable_lemmatizer"
30
+ backoff = "orth"
31
+ min_tree_freq = 3
32
  overwrite = false
33
  scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
34
+ top_k = 1
35
+
36
+ [components.lemmatizer.model]
37
+ @architectures = "spacy.Tagger.v2"
38
+ nO = null
39
+ normalize = false
40
+
41
+ [components.lemmatizer.model.tok2vec]
42
+ @architectures = "spacy.Tok2VecListener.v1"
43
+ width = ${components.tok2vec.model.encode:width}
44
+ upstream = "tok2vec"
45
 
46
  [components.morphologizer]
47
  factory = "morphologizer"
 
50
  scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
51
 
52
  [components.morphologizer.model]
53
+ @architectures = "spacy.Tagger.v2"
54
  nO = null
55
+ normalize = false
56
 
57
  [components.morphologizer.model.tok2vec]
58
  @architectures = "spacy.Tok2VecListener.v1"
 
82
  @architectures = "spacy.MultiHashEmbed.v2"
83
  width = 96
84
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
85
+ rows = [5000,1000,2500,2500,50]
86
  include_static_vectors = true
87
 
88
  [components.ner.model.tok2vec.encode]
 
120
  scorer = {"@scorers":"spacy.senter_scorer.v1"}
121
 
122
  [components.senter.model]
123
+ @architectures = "spacy.Tagger.v2"
124
  nO = null
125
+ normalize = false
126
 
127
  [components.senter.model.tok2vec]
128
  @architectures = "spacy.Tok2Vec.v2"
 
151
  @architectures = "spacy.MultiHashEmbed.v2"
152
  width = ${components.tok2vec.model.encode:width}
153
  attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
154
+ rows = [5000,1000,2500,2500,50]
155
  include_static_vectors = true
156
 
157
  [components.tok2vec.model.encode]
 
188
  accumulate_gradient = 1
189
  patience = 5000
190
  max_epochs = 0
191
+ max_steps = 100000
192
  eval_frequency = 1000
193
  frozen_components = []
194
  before_to_disk = null
 
223
  learn_rate = 0.001
224
 
225
  [training.score_weights]
226
+ pos_acc = 0.14
227
+ morph_acc = 0.14
228
  morph_per_feat = null
229
  dep_uas = 0.0
230
+ dep_las = 0.29
231
  dep_las_per_type = null
232
  sents_p = null
233
  sents_r = null
234
+ sents_f = 0.04
235
+ lemma_acc = 0.1
236
+ ents_f = 0.29
237
  ents_p = 0.0
238
  ents_r = 0.0
239
  ents_per_type = null
 
250
 
251
  [initialize.components]
252
 
253
+ [initialize.components.lemmatizer]
254
+
255
+ [initialize.components.lemmatizer.labels]
256
+ @readers = "spacy.read_labels.v1"
257
+ path = "corpus/labels/trainable_lemmatizer.json"
258
+ require = false
259
+
260
  [initialize.components.morphologizer]
261
 
262
  [initialize.components.morphologizer.labels]
da_core_news_lg-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d6b7f22582316614bc9c5c122a532b4eff39fc3b411054ee16ad4f89e14de765
3
- size 573820420
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70b519f9d735120b8eee33370806c00bdb12214887bf6793783421fbbbff1dc3
3
+ size 567085252
lemmatizer/cfg ADDED
@@ -0,0 +1,457 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ 1,
4
+ 2,
5
+ 4,
6
+ 6,
7
+ 8,
8
+ 10,
9
+ 12,
10
+ 14,
11
+ 16,
12
+ 18,
13
+ 20,
14
+ 24,
15
+ 28,
16
+ 30,
17
+ 32,
18
+ 34,
19
+ 36,
20
+ 39,
21
+ 41,
22
+ 42,
23
+ 43,
24
+ 45,
25
+ 47,
26
+ 49,
27
+ 51,
28
+ 53,
29
+ 55,
30
+ 57,
31
+ 61,
32
+ 65,
33
+ 67,
34
+ 71,
35
+ 73,
36
+ 75,
37
+ 77,
38
+ 79,
39
+ 81,
40
+ 83,
41
+ 85,
42
+ 87,
43
+ 89,
44
+ 91,
45
+ 93,
46
+ 95,
47
+ 99,
48
+ 101,
49
+ 102,
50
+ 104,
51
+ 107,
52
+ 111,
53
+ 113,
54
+ 116,
55
+ 118,
56
+ 121,
57
+ 124,
58
+ 127,
59
+ 128,
60
+ 131,
61
+ 133,
62
+ 134,
63
+ 136,
64
+ 138,
65
+ 140,
66
+ 142,
67
+ 144,
68
+ 145,
69
+ 147,
70
+ 148,
71
+ 149,
72
+ 153,
73
+ 155,
74
+ 158,
75
+ 161,
76
+ 164,
77
+ 166,
78
+ 168,
79
+ 170,
80
+ 172,
81
+ 174,
82
+ 175,
83
+ 177,
84
+ 179,
85
+ 182,
86
+ 184,
87
+ 186,
88
+ 188,
89
+ 190,
90
+ 192,
91
+ 194,
92
+ 196,
93
+ 199,
94
+ 201,
95
+ 203,
96
+ 204,
97
+ 207,
98
+ 208,
99
+ 209,
100
+ 211,
101
+ 213,
102
+ 214,
103
+ 216,
104
+ 218,
105
+ 220,
106
+ 222,
107
+ 224,
108
+ 226,
109
+ 229,
110
+ 231,
111
+ 232,
112
+ 233,
113
+ 235,
114
+ 236,
115
+ 238,
116
+ 239,
117
+ 243,
118
+ 249,
119
+ 253,
120
+ 255,
121
+ 257,
122
+ 259,
123
+ 261,
124
+ 262,
125
+ 263,
126
+ 264,
127
+ 267,
128
+ 269,
129
+ 270,
130
+ 272,
131
+ 274,
132
+ 276,
133
+ 278,
134
+ 280,
135
+ 282,
136
+ 284,
137
+ 286,
138
+ 290,
139
+ 291,
140
+ 293,
141
+ 295,
142
+ 297,
143
+ 299,
144
+ 300,
145
+ 302,
146
+ 303,
147
+ 304,
148
+ 306,
149
+ 308,
150
+ 311,
151
+ 314,
152
+ 315,
153
+ 317,
154
+ 320,
155
+ 321,
156
+ 323,
157
+ 324,
158
+ 326,
159
+ 327,
160
+ 328,
161
+ 330,
162
+ 331,
163
+ 333,
164
+ 337,
165
+ 339,
166
+ 340,
167
+ 344,
168
+ 346,
169
+ 350,
170
+ 353,
171
+ 354,
172
+ 355,
173
+ 358,
174
+ 360,
175
+ 361,
176
+ 363,
177
+ 365,
178
+ 366,
179
+ 369,
180
+ 372,
181
+ 373,
182
+ 376,
183
+ 380,
184
+ 382,
185
+ 383,
186
+ 384,
187
+ 386,
188
+ 387,
189
+ 389,
190
+ 391,
191
+ 392,
192
+ 394,
193
+ 395,
194
+ 398,
195
+ 400,
196
+ 402,
197
+ 404,
198
+ 406,
199
+ 409,
200
+ 411,
201
+ 412,
202
+ 413,
203
+ 415,
204
+ 417,
205
+ 420,
206
+ 421,
207
+ 423,
208
+ 424,
209
+ 425,
210
+ 427,
211
+ 429,
212
+ 431,
213
+ 433,
214
+ 434,
215
+ 436,
216
+ 437,
217
+ 439,
218
+ 440,
219
+ 442,
220
+ 444,
221
+ 445,
222
+ 449,
223
+ 450,
224
+ 452,
225
+ 454,
226
+ 457,
227
+ 459,
228
+ 462,
229
+ 465,
230
+ 466,
231
+ 468,
232
+ 470,
233
+ 471,
234
+ 474,
235
+ 475,
236
+ 478,
237
+ 480,
238
+ 483,
239
+ 485,
240
+ 486,
241
+ 487,
242
+ 489,
243
+ 491,
244
+ 492,
245
+ 493,
246
+ 495,
247
+ 496,
248
+ 498,
249
+ 500,
250
+ 501,
251
+ 502,
252
+ 503,
253
+ 504,
254
+ 505,
255
+ 507,
256
+ 508,
257
+ 509,
258
+ 510,
259
+ 511,
260
+ 512,
261
+ 514,
262
+ 515,
263
+ 516,
264
+ 518,
265
+ 519,
266
+ 520,
267
+ 521,
268
+ 523,
269
+ 525,
270
+ 526,
271
+ 528,
272
+ 531,
273
+ 533,
274
+ 535,
275
+ 453,
276
+ 536,
277
+ 538,
278
+ 539,
279
+ 541,
280
+ 545,
281
+ 547,
282
+ 548,
283
+ 549,
284
+ 550,
285
+ 551,
286
+ 553,
287
+ 554,
288
+ 555,
289
+ 557,
290
+ 559,
291
+ 560,
292
+ 561,
293
+ 563,
294
+ 565,
295
+ 566,
296
+ 567,
297
+ 568,
298
+ 570,
299
+ 571,
300
+ 575,
301
+ 577,
302
+ 578,
303
+ 579,
304
+ 582,
305
+ 585,
306
+ 587,
307
+ 589,
308
+ 593,
309
+ 594,
310
+ 596,
311
+ 597,
312
+ 601,
313
+ 603,
314
+ 605,
315
+ 609,
316
+ 611,
317
+ 612,
318
+ 613,
319
+ 614,
320
+ 615,
321
+ 616,
322
+ 617,
323
+ 619,
324
+ 621,
325
+ 622,
326
+ 624,
327
+ 625,
328
+ 627,
329
+ 628,
330
+ 629,
331
+ 632,
332
+ 634,
333
+ 638,
334
+ 639,
335
+ 640,
336
+ 642,
337
+ 644,
338
+ 647,
339
+ 649,
340
+ 650,
341
+ 651,
342
+ 653,
343
+ 654,
344
+ 655,
345
+ 657,
346
+ 658,
347
+ 659,
348
+ 661,
349
+ 663,
350
+ 665,
351
+ 667,
352
+ 669,
353
+ 670,
354
+ 672,
355
+ 674,
356
+ 676,
357
+ 677,
358
+ 678,
359
+ 680,
360
+ 682,
361
+ 683,
362
+ 685,
363
+ 686,
364
+ 688,
365
+ 689,
366
+ 690,
367
+ 691,
368
+ 694,
369
+ 695,
370
+ 696,
371
+ 697,
372
+ 699,
373
+ 700,
374
+ 701,
375
+ 703,
376
+ 705,
377
+ 706,
378
+ 707,
379
+ 708,
380
+ 712,
381
+ 715,
382
+ 716,
383
+ 718,
384
+ 720,
385
+ 724,
386
+ 726,
387
+ 729,
388
+ 730,
389
+ 732,
390
+ 733,
391
+ 734,
392
+ 736,
393
+ 738,
394
+ 739,
395
+ 740,
396
+ 741,
397
+ 742,
398
+ 743,
399
+ 744,
400
+ 747,
401
+ 749,
402
+ 753,
403
+ 756,
404
+ 758,
405
+ 759,
406
+ 761,
407
+ 762,
408
+ 763,
409
+ 764,
410
+ 766,
411
+ 768,
412
+ 769,
413
+ 771,
414
+ 773,
415
+ 774,
416
+ 775,
417
+ 776,
418
+ 777,
419
+ 781,
420
+ 783,
421
+ 784,
422
+ 785,
423
+ 788,
424
+ 791,
425
+ 792,
426
+ 794,
427
+ 796,
428
+ 797,
429
+ 798,
430
+ 799,
431
+ 800,
432
+ 802,
433
+ 803,
434
+ 804,
435
+ 805,
436
+ 806,
437
+ 808,
438
+ 809,
439
+ 810,
440
+ 811,
441
+ 812,
442
+ 814,
443
+ 815,
444
+ 817,
445
+ 819,
446
+ 820,
447
+ 822,
448
+ 824,
449
+ 825,
450
+ 827,
451
+ 829,
452
+ 831,
453
+ 833,
454
+ 835,
455
+ 837
456
+ ]
457
+ }
lemmatizer/{lookups/lookups.bin → model} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6864ce8705293ba1b6dcf349ec133cdc33db3ba57f6e9337458cfe5073b6f103
3
- size 11537995
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd1961075ded3bc09a5a58ead51adad20e36d70d2a099362fe21386796b1521e
3
+ size 176206
lemmatizer/trees ADDED
Binary file (89.9 kB). View file
 
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"core_news_lg",
4
- "version":"3.2.0",
5
- "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, senter, ner, attribute_ruler, lemmatizer.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
- "spacy_version":">=3.2.0,<3.3.0",
11
- "spacy_git_version":"bb26550e2",
12
  "vectors":{
13
  "width":300,
14
  "vectors":500000,
@@ -212,15 +212,8 @@
212
  "punct",
213
  "xcomp"
214
  ],
215
- "senter":[
216
- "I",
217
- "S"
218
- ],
219
  "attribute_ruler":[
220
 
221
- ],
222
- "lemmatizer":[
223
-
224
  ],
225
  "ner":[
226
  "LOC",
@@ -233,17 +226,17 @@
233
  "tok2vec",
234
  "morphologizer",
235
  "parser",
236
- "attribute_ruler",
237
  "lemmatizer",
 
238
  "ner"
239
  ],
240
  "components":[
241
  "tok2vec",
242
  "morphologizer",
243
  "parser",
 
244
  "senter",
245
  "attribute_ruler",
246
- "lemmatizer",
247
  "ner"
248
  ],
249
  "disabled":[
@@ -254,81 +247,81 @@
254
  "token_p":0.9977732598,
255
  "token_r":0.9974835463,
256
  "token_f":0.997628382,
257
- "pos_acc":0.9650363196,
258
- "morph_acc":0.9571912833,
259
- "morph_micro_p":0.9722335025,
260
- "morph_micro_r":0.9667861289,
261
- "morph_micro_f":0.969502164,
262
  "morph_per_feat":{
263
  "Mood":{
264
- "p":0.9781160799,
265
- "r":0.9799809342,
266
- "f":0.979047619
267
  },
268
  "Tense":{
269
- "p":0.9765506808,
270
- "r":0.9721385542,
271
- "f":0.9743396226
272
  },
273
  "VerbForm":{
274
- "p":0.9697156984,
275
- "r":0.9602203182,
276
- "f":0.9649446494
277
  },
278
  "Voice":{
279
- "p":0.9797752809,
280
- "r":0.9775784753,
281
- "f":0.9786756453
282
  },
283
  "Definite":{
284
- "p":0.9666401906,
285
- "r":0.9616752272,
286
- "f":0.9641513171
287
  },
288
  "Gender":{
289
- "p":0.9588903743,
290
- "r":0.9534729146,
291
- "f":0.956173971
292
  },
293
  "Number":{
294
- "p":0.9666403993,
295
- "r":0.9598330725,
296
- "f":0.9632247088
297
  },
298
  "AdpType":{
299
- "p":0.9991071429,
300
- "r":0.9893899204,
301
- "f":0.994224789
302
  },
303
  "PartType":{
304
- "p":1.0,
305
  "r":1.0,
306
- "f":1.0
307
  },
308
  "Case":{
309
- "p":0.9792332268,
310
- "r":0.9684044234,
311
- "f":0.9737887212
312
  },
313
  "Person":{
314
- "p":0.9805996473,
315
- "r":0.9875666075,
316
- "f":0.9840707965
317
  },
318
  "PronType":{
319
- "p":0.9860082305,
320
- "r":0.9851973684,
321
- "f":0.9856026327
322
  },
323
  "NumType":{
324
- "p":0.9731543624,
325
  "r":0.9602649007,
326
- "f":0.9666666667
327
  },
328
  "Degree":{
329
- "p":0.9587878788,
330
- "r":0.9530120482,
331
- "f":0.9558912387
332
  },
333
  "Reflex":{
334
  "p":1.0,
@@ -336,24 +329,24 @@
336
  "f":1.0
337
  },
338
  "Number[psor]":{
339
- "p":0.9885057471,
340
  "r":1.0,
341
- "f":0.9942196532
342
  },
343
  "Poss":{
344
- "p":1.0,
345
  "r":1.0,
346
- "f":1.0
347
  },
348
  "Foreign":{
349
- "p":0.6666666667,
350
- "r":0.4,
351
- "f":0.5
352
  },
353
  "Abbr":{
354
- "p":1.0,
355
- "r":0.2,
356
- "f":0.3333333333
357
  },
358
  "Style":{
359
  "p":1.0,
@@ -361,146 +354,151 @@
361
  "f":1.0
362
  },
363
  "Polite":{
364
- "p":1.0,
365
- "r":0.5,
366
- "f":0.6666666667
367
  }
368
  },
369
- "sents_p":0.9142335766,
370
- "sents_r":0.8882978723,
371
- "sents_f":0.9010791367,
372
- "dep_uas":0.8225959658,
373
- "dep_las":0.7833277461,
374
  "dep_las_per_type":{
375
  "advmod":{
376
- "p":0.6842105263,
377
- "r":0.697740113,
378
- "f":0.6909090909
379
  },
380
  "root":{
381
- "p":0.8513761468,
382
- "r":0.8226950355,
383
- "f":0.8367899008
384
  },
385
  "nsubj":{
386
- "p":0.8508583691,
387
- "r":0.8364978903,
388
- "f":0.8436170213
389
  },
390
  "case":{
391
- "p":0.8953603159,
392
- "r":0.8944773176,
393
- "f":0.8949185989
394
  },
395
  "obl":{
396
- "p":0.71973466,
397
  "r":0.6739130435,
398
- "f":0.6960705694
399
  },
400
  "cc":{
401
- "p":0.7885714286,
402
- "r":0.8023255814,
403
- "f":0.795389049
404
  },
405
  "conj":{
406
- "p":0.647696477,
407
- "r":0.6373333333,
408
- "f":0.6424731183
409
  },
410
  "obj":{
411
- "p":0.8161764706,
412
- "r":0.8621359223,
413
- "f":0.8385269122
414
  },
415
  "aux":{
416
- "p":0.8742690058,
417
- "r":0.8717201166,
418
- "f":0.8729927007
419
  },
420
  "acl:relcl":{
421
- "p":0.5773195876,
422
- "r":0.6054054054,
423
- "f":0.5910290237
424
  },
425
  "advmod:lmod":{
426
- "p":0.6714285714,
427
- "r":0.7014925373,
428
- "f":0.6861313869
429
  },
430
  "det":{
431
- "p":0.9253731343,
432
- "r":0.9192751236,
433
- "f":0.9223140496
434
  },
435
  "amod":{
436
- "p":0.8313458262,
437
- "r":0.8327645051,
438
- "f":0.832054561
439
  },
440
  "nmod:poss":{
441
- "p":0.6326530612,
442
- "r":0.6138613861,
443
- "f":0.6231155779
444
  },
445
  "ccomp":{
446
- "p":0.676056338,
447
- "r":0.7741935484,
448
- "f":0.7218045113
449
  },
450
  "nummod":{
451
- "p":0.8620689655,
452
- "r":0.8333333333,
453
- "f":0.8474576271
454
  },
455
  "flat":{
456
- "p":0.8,
457
- "r":0.8741721854,
458
- "f":0.835443038
459
  },
460
  "compound:prt":{
461
- "p":0.4,
462
- "r":0.3414634146,
463
- "f":0.3684210526
464
  },
465
  "advcl":{
466
- "p":0.6181818182,
467
- "r":0.5862068966,
468
- "f":0.6017699115
469
  },
470
  "mark":{
471
- "p":0.8782051282,
472
- "r":0.8439425051,
473
- "f":0.8607329843
474
  },
475
  "cop":{
476
- "p":0.7842105263,
477
- "r":0.8514285714,
478
- "f":0.8164383562
479
  },
480
  "dep":{
481
- "p":0.1707317073,
482
- "r":0.2641509434,
483
- "f":0.2074074074
484
  },
485
  "nmod":{
486
- "p":0.6296992481,
487
- "r":0.654296875,
488
- "f":0.6417624521
489
  },
490
  "iobj":{
491
- "p":0.8,
492
- "r":0.5454545455,
493
- "f":0.6486486486
494
  },
495
  "xcomp":{
496
- "p":0.5333333333,
497
- "r":0.406779661,
498
- "f":0.4615384615
 
 
 
 
 
499
  },
500
  "list":{
501
- "p":0.5454545455,
502
  "r":0.3333333333,
503
- "f":0.4137931034
504
  },
505
  "vocative":{
506
  "p":0.0,
@@ -508,64 +506,59 @@
508
  "f":0.0
509
  },
510
  "fixed":{
511
- "p":0.8648648649,
512
- "r":0.7804878049,
513
- "f":0.8205128205
514
  },
515
- "expl":{
516
- "p":0.8,
517
- "r":0.8235294118,
518
- "f":0.8115942029
519
  },
520
- "appos":{
521
- "p":0.4054054054,
522
- "r":0.4545454545,
523
- "f":0.4285714286
524
  },
525
  "obl:tmod":{
526
- "p":0.5555555556,
527
- "r":0.2777777778,
528
- "f":0.3703703704
529
  },
530
  "discourse":{
531
  "p":0.0,
532
  "r":0.0,
533
  "f":0.0
534
- },
535
- "obl:lmod":{
536
- "p":0.0,
537
- "r":0.0,
538
- "f":0.0
539
  }
540
  },
541
- "tag_acc":0.9650363196,
542
- "lemma_acc":0.8491041162,
543
- "ents_p":0.8219461698,
544
- "ents_r":0.8270833333,
545
- "ents_f":0.8245067497,
546
  "ents_per_type":{
547
  "PER":{
548
- "p":0.9171974522,
549
- "r":0.8674698795,
550
- "f":0.8916408669
551
  },
552
  "ORG":{
553
- "p":0.7840909091,
554
- "r":0.7666666667,
555
- "f":0.7752808989
556
  },
557
  "MISC":{
558
- "p":0.6776859504,
559
- "r":0.7256637168,
560
- "f":0.7008547009
561
  },
562
  "LOC":{
563
- "p":0.8717948718,
564
- "r":0.9189189189,
565
- "f":0.8947368421
566
  }
567
  },
568
- "speed":8840.2170640394
569
  },
570
  "sources":[
571
  {
@@ -580,12 +573,6 @@
580
  "license":"CC BY-SA 4.0",
581
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
582
  },
583
- {
584
- "name":"Lemmatization Lists",
585
- "url":"https://github.com/michmech/lemmatization-lists/",
586
- "license":"ODbL",
587
- "author":"Michal M\u011bchura"
588
- },
589
  {
590
  "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
591
  "url":"https://spacy.io",
 
1
  {
2
  "lang":"da",
3
  "name":"core_news_lg",
4
+ "version":"3.3.0",
5
+ "description":"Danish pipeline optimized for CPU. Components: tok2vec, morphologizer, parser, lemmatizer (trainable_lemmatizer), senter, ner, attribute_ruler.",
6
  "author":"Explosion",
7
  "email":"contact@explosion.ai",
8
  "url":"https://explosion.ai",
9
  "license":"CC BY-SA 4.0",
10
+ "spacy_version":">=3.3.0.dev0,<3.4.0",
11
+ "spacy_git_version":"849bef2de",
12
  "vectors":{
13
  "width":300,
14
  "vectors":500000,
 
212
  "punct",
213
  "xcomp"
214
  ],
 
 
 
 
215
  "attribute_ruler":[
216
 
 
 
 
217
  ],
218
  "ner":[
219
  "LOC",
 
226
  "tok2vec",
227
  "morphologizer",
228
  "parser",
 
229
  "lemmatizer",
230
+ "attribute_ruler",
231
  "ner"
232
  ],
233
  "components":[
234
  "tok2vec",
235
  "morphologizer",
236
  "parser",
237
+ "lemmatizer",
238
  "senter",
239
  "attribute_ruler",
 
240
  "ner"
241
  ],
242
  "disabled":[
 
247
  "token_p":0.9977732598,
248
  "token_r":0.9974835463,
249
  "token_f":0.997628382,
250
+ "pos_acc":0.9633898305,
251
+ "morph_acc":0.9568038741,
252
+ "morph_micro_p":0.9727434528,
253
+ "morph_micro_r":0.9655746807,
254
+ "morph_micro_f":0.9691458101,
255
  "morph_per_feat":{
256
  "Mood":{
257
+ "p":0.9799043062,
258
+ "r":0.9761677788,
259
+ "f":0.9780324737
260
  },
261
  "Tense":{
262
+ "p":0.9772727273,
263
+ "r":0.9713855422,
264
+ "f":0.9743202417
265
  },
266
  "VerbForm":{
267
+ "p":0.9686153846,
268
+ "r":0.9632802938,
269
+ "f":0.9659404725
270
  },
271
  "Voice":{
272
+ "p":0.9798206278,
273
+ "r":0.9798206278,
274
+ "f":0.9798206278
275
  },
276
  "Definite":{
277
+ "p":0.968812475,
278
+ "r":0.9573291189,
279
+ "f":0.963036566
280
  },
281
  "Gender":{
282
+ "p":0.9597720416,
283
+ "r":0.9514788966,
284
+ "f":0.9556074766
285
  },
286
  "Number":{
287
+ "p":0.9683961022,
288
+ "r":0.9590505999,
289
+ "f":0.9637006945
290
  },
291
  "AdpType":{
292
+ "p":0.9982206406,
293
+ "r":0.9920424403,
294
+ "f":0.9951219512
295
  },
296
  "PartType":{
297
+ "p":0.996763754,
298
  "r":1.0,
299
+ "f":0.9983792545
300
  },
301
  "Case":{
302
+ "p":0.9806451613,
303
+ "r":0.9605055292,
304
+ "f":0.9704708699
305
  },
306
  "Person":{
307
+ "p":0.9804270463,
308
+ "r":0.9786856128,
309
+ "f":0.9795555556
310
  },
311
  "PronType":{
312
+ "p":0.9835390947,
313
+ "r":0.9827302632,
314
+ "f":0.9831345125
315
  },
316
  "NumType":{
317
+ "p":0.9931506849,
318
  "r":0.9602649007,
319
+ "f":0.9764309764
320
  },
321
  "Degree":{
322
+ "p":0.9578313253,
323
+ "r":0.9578313253,
324
+ "f":0.9578313253
325
  },
326
  "Reflex":{
327
  "p":1.0,
 
329
  "f":1.0
330
  },
331
  "Number[psor]":{
332
+ "p":0.9772727273,
333
  "r":1.0,
334
+ "f":0.9885057471
335
  },
336
  "Poss":{
337
+ "p":0.9887640449,
338
  "r":1.0,
339
+ "f":0.9943502825
340
  },
341
  "Foreign":{
342
+ "p":0.6,
343
+ "r":0.3,
344
+ "f":0.4
345
  },
346
  "Abbr":{
347
+ "p":0.0,
348
+ "r":0.0,
349
+ "f":0.0
350
  },
351
  "Style":{
352
  "p":1.0,
 
354
  "f":1.0
355
  },
356
  "Polite":{
357
+ "p":0.75,
358
+ "r":0.75,
359
+ "f":0.75
360
  }
361
  },
362
+ "sents_p":0.9103942652,
363
+ "sents_r":0.9007092199,
364
+ "sents_f":0.9055258467,
365
+ "dep_uas":0.8195787003,
366
+ "dep_las":0.7807576266,
367
  "dep_las_per_type":{
368
  "advmod":{
369
+ "p":0.6955345061,
370
+ "r":0.7259887006,
371
+ "f":0.7104353836
372
  },
373
  "root":{
374
+ "p":0.824686941,
375
+ "r":0.8173758865,
376
+ "f":0.821015138
377
  },
378
  "nsubj":{
379
+ "p":0.8361884368,
380
+ "r":0.8238396624,
381
+ "f":0.829968119
382
  },
383
  "case":{
384
+ "p":0.9003984064,
385
+ "r":0.8915187377,
386
+ "f":0.8959365709
387
  },
388
  "obl":{
389
+ "p":0.7221297837,
390
  "r":0.6739130435,
391
+ "f":0.697188755
392
  },
393
  "cc":{
394
+ "p":0.7630057803,
395
+ "r":0.7674418605,
396
+ "f":0.7652173913
397
  },
398
  "conj":{
399
+ "p":0.6106442577,
400
+ "r":0.5813333333,
401
+ "f":0.5956284153
402
  },
403
  "obj":{
404
+ "p":0.7893772894,
405
+ "r":0.8368932039,
406
+ "f":0.8124410933
407
  },
408
  "aux":{
409
+ "p":0.8764705882,
410
+ "r":0.8688046647,
411
+ "f":0.8726207906
412
  },
413
  "acl:relcl":{
414
+ "p":0.6300578035,
415
+ "r":0.5891891892,
416
+ "f":0.6089385475
417
  },
418
  "advmod:lmod":{
419
+ "p":0.7272727273,
420
+ "r":0.7164179104,
421
+ "f":0.7218045113
422
  },
423
  "det":{
424
+ "p":0.9140495868,
425
+ "r":0.9110378913,
426
+ "f":0.9125412541
427
  },
428
  "amod":{
429
+ "p":0.8080645161,
430
+ "r":0.8549488055,
431
+ "f":0.8308457711
432
  },
433
  "nmod:poss":{
434
+ "p":0.7373737374,
435
+ "r":0.7227722772,
436
+ "f":0.73
437
  },
438
  "ccomp":{
439
+ "p":0.7068965517,
440
+ "r":0.6612903226,
441
+ "f":0.6833333333
442
  },
443
  "nummod":{
444
+ "p":0.8360655738,
445
+ "r":0.85,
446
+ "f":0.8429752066
447
  },
448
  "flat":{
449
+ "p":0.7844311377,
450
+ "r":0.8675496689,
451
+ "f":0.8238993711
452
  },
453
  "compound:prt":{
454
+ "p":0.5,
455
+ "r":0.2926829268,
456
+ "f":0.3692307692
457
  },
458
  "advcl":{
459
+ "p":0.6545454545,
460
+ "r":0.6206896552,
461
+ "f":0.6371681416
462
  },
463
  "mark":{
464
+ "p":0.8781512605,
465
+ "r":0.8583162218,
466
+ "f":0.8681204569
467
  },
468
  "cop":{
469
+ "p":0.8121546961,
470
+ "r":0.84,
471
+ "f":0.8258426966
472
  },
473
  "dep":{
474
+ "p":0.145631068,
475
+ "r":0.2830188679,
476
+ "f":0.1923076923
477
  },
478
  "nmod":{
479
+ "p":0.6549707602,
480
+ "r":0.65625,
481
+ "f":0.6556097561
482
  },
483
  "iobj":{
484
+ "p":0.8125,
485
+ "r":0.5909090909,
486
+ "f":0.6842105263
487
  },
488
  "xcomp":{
489
+ "p":0.4772727273,
490
+ "r":0.3559322034,
491
+ "f":0.4077669903
492
+ },
493
+ "appos":{
494
+ "p":0.5384615385,
495
+ "r":0.4242424242,
496
+ "f":0.4745762712
497
  },
498
  "list":{
499
+ "p":0.5,
500
  "r":0.3333333333,
501
+ "f":0.4
502
  },
503
  "vocative":{
504
  "p":0.0,
 
506
  "f":0.0
507
  },
508
  "fixed":{
509
+ "p":0.8717948718,
510
+ "r":0.8292682927,
511
+ "f":0.85
512
  },
513
+ "obl:lmod":{
514
+ "p":0.0,
515
+ "r":0.0,
516
+ "f":0.0
517
  },
518
+ "expl":{
519
+ "p":0.8529411765,
520
+ "r":0.8529411765,
521
+ "f":0.8529411765
522
  },
523
  "obl:tmod":{
524
+ "p":0.6363636364,
525
+ "r":0.3888888889,
526
+ "f":0.4827586207
527
  },
528
  "discourse":{
529
  "p":0.0,
530
  "r":0.0,
531
  "f":0.0
 
 
 
 
 
532
  }
533
  },
534
+ "lemma_acc":0.9516707022,
535
+ "tag_acc":0.9633898305,
536
+ "ents_p":0.8183716075,
537
+ "ents_r":0.8166666667,
538
+ "ents_f":0.8175182482,
539
  "ents_per_type":{
540
  "PER":{
541
+ "p":0.8993710692,
542
+ "r":0.8614457831,
543
+ "f":0.88
544
  },
545
  "ORG":{
546
+ "p":0.7303370787,
547
+ "r":0.7222222222,
548
+ "f":0.7262569832
549
  },
550
  "MISC":{
551
+ "p":0.7288135593,
552
+ "r":0.7610619469,
553
+ "f":0.7445887446
554
  },
555
  "LOC":{
556
+ "p":0.8672566372,
557
+ "r":0.8828828829,
558
+ "f":0.875
559
  }
560
  },
561
+ "speed":10791.2692595094
562
  },
563
  "sources":[
564
  {
 
573
  "license":"CC BY-SA 4.0",
574
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
575
  },
 
 
 
 
 
 
576
  {
577
  "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
578
  "url":"https://spacy.io",
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:545faa5af6230b1e75b0e7bec684ca7bfaeb9fc509d4af9964093cb9e716abcd
3
- size 61299
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c9a904d06964b6afa205f053f74bc3b869bab70872d9265d38fadd867450df26
3
+ size 61351
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:baa83b1fe95f92d394cdd9c7918e2e540153028d757d2c0f337e49713f54018d
3
- size 7091792
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d60d6db1813b3f2d8de3dea75ed89b12bf168bb820f9bda6630e5a51d4d1ecb
3
+ size 6496592
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1ee9712d6daa68e5099e153041488410b84642c2fdc3e09e3c53ddf1498e5015
3
  size 308728
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c112c2427b1cfb0608eb1ef39d0206558de657c031eca80773d43e578f7517f8
3
  size 308728
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�D{"0":{"":41514},"1":{"":34295},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8601,"obl":3949,"obj":3758,"nmod":3565,"conj":2745,"advmod":2095,"flat":1295,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":628,"advmod:lmod":423,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"obl:lmod":44,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
 
1
+ ��moves�D{"0":{"":41615},"1":{"":34382},"2":{"case":7526,"nsubj":6005,"det":4341,"amod":3967,"advmod":3662,"mark":3530,"aux":2436,"cc":2264,"punct":2187,"cop":1330,"obl":894,"nummod":834,"nmod:poss":656,"nmod":463,"expl":291,"ccomp":203,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":48,"acl:relcl":43},"3":{"punct":8693,"obl":3951,"obj":3760,"nmod":3569,"conj":2747,"advmod":2087,"flat":1302,"nsubj":1169,"acl:relcl":1132,"advcl":809,"amod":622,"advmod:lmod":423,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":168,"list":159,"nmod:poss":156,"punct||conj":151,"cc":135,"mark":133,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"obl:lmod":44,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4383}}�cfg��neg_key�
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4aed53103edd9c5cb25000aa2d8b2219a6007e6bbbc2b2368a9dc316a7d24d4f
3
- size 219901
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5afa93a8f788243e6c02d5ab57762e55511fbfa00f89ee3c21bd75cea7ae6bc
3
+ size 219953
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1c071819ef9aa2286b8d502b15e8b8f86b3fdbd755e0747233869eb4641f8c16
3
- size 6960804
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afa9aae853e4d60a66837dd127015d27d43c8f772b22c8c7b172238e5dfaa846
3
+ size 6365604
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
vocab/key2row CHANGED
Binary files a/vocab/key2row and b/vocab/key2row differ
 
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:42fe610567bec6fa69da580a4753d083f1f4429efd32b3c1fa638b6a07a6757e
3
- size 10070327
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:528d1a6bb62dc4d608b0ea1be75d557f41cdd76867460448bbbc174d34ae193a
3
+ size 10081139