shibing624
/

bert4ner-base-chinese

@@ -12,13 +12,17 @@ license: "apache-2.0"
 # BERT for Chinese Named Entity Recognition(bert4ner) Model
 中文实体识别模型
-`bert4ner-base-chinese` evaluate CNER test data：
-- precision: 0.9395, recall: 0.9604, f1: 0.9498
-由于训练使用的数据使用了CNER的训练集，在CNER的测试集上达到接近SOTA水平。
-模型结构，标准BertSoftmax的网络结构：
 ![arch](bert.png)
@@ -30,7 +34,7 @@ license: "apache-2.0"
 >>> from nerpy import NERModel
 >>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
 >>> predictions, raw_outputs, entities = model.predict(["常建良，男，1963年出生，工科学士，高级工程师"], split_on_space=False)
-entities: [('常建良', 'NAME'), ('工科', 'PRO'), ('学士', 'EDU'), ('高级工程师', 'TITLE')]
 ```
 模型文件组成：
@@ -38,7 +42,6 @@ entities: [('常建良', 'NAME'), ('工科', 'PRO'), ('学士', 'EDU'), ('高级
 bert4ner-base-chinese
     ├── config.json
     ├── model_args.json
-    ├── eval_result.txt
     ├── pytorch_model.bin
     ├── special_tokens_map.json
     ├── tokenizer_config.json
@@ -52,7 +55,7 @@ bert4ner-base-chinese
 | 数据集 | 语料 | 下载链接 | 文件大小 |
 | :------- | :--------- | :---------: | :---------: |
 | **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
-| **`PEOPLE中文实体识别数据集`** | 人民日报实体集（200万字） | [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)| 12.8MB |
 CNER中文实体识别数据集，数据格式：

 # BERT for Chinese Named Entity Recognition(bert4ner) Model
 中文实体识别模型
+`bert4ner-base-chinese` evaluate PEOPLE(人民日报) test data：
+The overall performance of BERT on people **test**:
+|              | Accuracy  | Recall    | F1  |
+| ------------ | ------------------ | ------------------ | ------------------ |
+| BertSoftmax | 0.9425     | 0.9627   | 0.9525     |
+在PEOPLE的测试集上达到接近SOTA水平。
+BertSoftmax的网络结构(原生BERT)：
 ![arch](bert.png)
 >>> from nerpy import NERModel
 >>> model = NERModel("bert", "shibing624/bert4ner-base-chinese")
 >>> predictions, raw_outputs, entities = model.predict(["常建良，男，1963年出生，工科学士，高级工程师"], split_on_space=False)
+entities: [('常建良', 'PER'), ('1963年', 'TIME')]
 ```
 模型文件组成：
 bert4ner-base-chinese
     ├── config.json
     ├── model_args.json
     ├── pytorch_model.bin
     ├── special_tokens_map.json
     ├── tokenizer_config.json
 | 数据集 | 语料 | 下载链接 | 文件大小 |
 | :------- | :--------- | :---------: | :---------: |
 | **`CNER中文实体识别数据集`** | CNER(12万字) | [CNER github](https://github.com/shibing624/nerpy/tree/main/examples/data/cner)| 1.1MB |
+| **`PEOPLE中文实体识别数据集`** | 人民日报数据集（200万字） | [PEOPLE github](https://github.com/shibing624/nerpy/tree/main/examples/data/people)| 12.8MB |
 CNER中文实体识别数据集，数据格式：