shibing624
commited on
Commit
•
bb20b28
1
Parent(s):
de19af6
Update README.md
Browse files
README.md
CHANGED
@@ -12,13 +12,17 @@ license: "apache-2.0"
|
|
12 |
# BERT for Chinese Named Entity Recognition(bert4ner) Model
|
13 |
中文实体识别模型
|
14 |
|
15 |
-
`bert4ner-base-chinese` evaluate
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
|
|
|
|
20 |
|
21 |
-
|
|
|
|
|
22 |
|
23 |
![arch](bert.png)
|
24 |
|
@@ -30,7 +34,7 @@ license: "apache-2.0"
|
|
30 |
>>> from nerpy import NERModel
|
31 |
>>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
|
32 |
>>> predictions, raw_outputs, entities = model.predict(["常建良,男,1963年出生,工科学士,高级工程师"], split_on_space=False)
|
33 |
-
entities: [('常建良', '
|
34 |
```
|
35 |
|
36 |
模型文件组成:
|
@@ -38,7 +42,6 @@ entities: [('常建良', 'NAME'), ('工科', 'PRO'), ('学士', 'EDU'), ('高级
|
|
38 |
bert4ner-base-chinese
|
39 |
├── config.json
|
40 |
├── model_args.json
|
41 |
-
├── eval_result.txt
|
42 |
├── pytorch_model.bin
|
43 |
├── special_tokens_map.json
|
44 |
├── tokenizer_config.json
|
@@ -52,7 +55,7 @@ bert4ner-base-chinese
|
|
52 |
| 数据集 | 语料 | 下载链接 | 文件大小 |
|
53 |
| :------- | :--------- | :---------: | :---------: |
|
54 |
| **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
|
55 |
-
| **`PEOPLE中文实体识别数据集`** |
|
56 |
|
57 |
|
58 |
CNER中文实体识别数据集,数据格式:
|
|
|
12 |
# BERT for Chinese Named Entity Recognition(bert4ner) Model
|
13 |
中文实体识别模型
|
14 |
|
15 |
+
`bert4ner-base-chinese` evaluate PEOPLE(人民日报) test data:
|
16 |
|
17 |
+
The overall performance of BERT on people **test**:
|
18 |
|
19 |
+
| | Accuracy | Recall | F1 |
|
20 |
+
| ------------ | ------------------ | ------------------ | ------------------ |
|
21 |
+
| BertSoftmax | 0.9425 | 0.9627 | 0.9525 |
|
22 |
|
23 |
+
在PEOPLE的测试集上达到接近SOTA水平。
|
24 |
+
|
25 |
+
BertSoftmax的网络结构(原生BERT):
|
26 |
|
27 |
![arch](bert.png)
|
28 |
|
|
|
34 |
>>> from nerpy import NERModel
|
35 |
>>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
|
36 |
>>> predictions, raw_outputs, entities = model.predict(["常建良,男,1963年出生,工科学士,高级工程师"], split_on_space=False)
|
37 |
+
entities: [('常建良', 'PER'), ('1963年', 'TIME')]
|
38 |
```
|
39 |
|
40 |
模型文件组成:
|
|
|
42 |
bert4ner-base-chinese
|
43 |
├── config.json
|
44 |
├── model_args.json
|
|
|
45 |
├── pytorch_model.bin
|
46 |
├── special_tokens_map.json
|
47 |
├── tokenizer_config.json
|
|
|
55 |
| 数据集 | 语料 | 下载链接 | 文件大小 |
|
56 |
| :------- | :--------- | :---------: | :---------: |
|
57 |
| **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
|
58 |
+
| **`PEOPLE中文实体识别数据集`** | 人民日报数据集(200万字) | [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)| 12.8MB |
|
59 |
|
60 |
|
61 |
CNER中文实体识别数据集,数据格式:
|