shibing624
/

bert4ner-base-chinese

Token Classification

Inference Endpoints

Model card Files Files and versions Community

shibing624 commited on May 7, 2022

Commit

bb7d55d

•

1 Parent(s): d210fa5

Update README.md

Files changed (1) hide show

README.md +84 -1

README.md CHANGED Viewed

@@ -1,3 +1,86 @@
 ---
-license: apache-2.0
 ---

 ---
+language:
+- zh
+tags:
+- bert
+- pytorch
+- zh
+- ner
+license: "apache-2.0"
 ---
+# BERT for Chinese Named Entity Recognition(bert4ner) Model
+中文实体识别模型
+`bert4ner-base-chinese` evaluate CNER test data：
+- precision: 0.9395, recall: 0.9604, f1: 0.9498
+由于训练使用的数据使用了CNER的训练集，在CNER的测试集上达到接近SOTA水平。
+模型结构，标准BertSoftmax的网络结构：
+![arch](bert.png)
+## Usage
+本项目开源在实体识别项目：[nerpy](https://github.com/shibing624/nerpy)，可支持bert4ner模型，通过如下命令调用：
+```shell
+>>> from nerpy import NERModel
+>>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
+>>> predictions, raw_outputs, entities = model.predict(["常建良，男，1963年出生，工科学士，高级工程师"], split_on_space=False)
+entities: [('常建良', 'NAME'), ('工科', 'PRO'), ('学士', 'EDU'), ('高级工程师', 'TITLE')]
+```
+模型文件组成：
+```
+bert4ner-base-chinese
+    ├── config.json
+    ├── model_args.json
+    ├── eval_result.txt
+    ├── pytorch_model.bin
+    ├── special_tokens_map.json
+    ├── tokenizer_config.json
+    └── vocab.txt
+```
+### 训练数据集
+#### 中文实体识别数据集
+| 数据集 | 语料 | 下载链接 | 文件大小 |
+| :------- | :--------- | :---------: | :---------: |
+| **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
+| **`PEOPLE中文实体识别数据集`** | 人民日报实体集（200万字） | [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)| 12.8MB |
+CNER中文实体识别数据集，数据格式：
+```text
+美	B-LOC
+国	I-LOC
+的	O
+华	B-PER
+莱	I-PER
+士	I-PER
+我	O
+跟	O
+他	O
+```
+如果需要训练bert4ner，请参考[https://github.com/shibing624/nerpy/tree/main/examples](https://github.com/shibing624/nerpy/tree/main/examples)
+## Citation
+```latex
+@software{nerpy,
+  author = {Xu Ming},
+  title = {nerpy: Named Entity Recognition toolkit},
+  year = {2022},
+  url = {https://github.com/shibing624/nerpy},
+}
+```