README.md · knosing/japanese_ner

metadata

license: apache-2.0
datasets:
  - stockmark/ner-wikipedia-dataset
language:
  - ja
  - en
metrics:
  - f1
  - recall
  - precision
  - accuracy
library_name: transformers
pipeline_tag: token-classification
tags:
  - ner
  - named entity recognition
  - stockmark ner
  - bert
  - japanese named entity recognition
  - japanese ner
  - transformers

Model Description

This model is a fine-tuned version of the tohoku-nlp/bert-base-japanese-v3, specifically optimized for Named Entity Recognition (NER) tasks. It is fine-tuned using a Japanese named entity extraction dataset derived from Wikipedia, which was developed and made publicly available by Stockmark Inc. (NER Wikipedia Dataset).

Intended Use

This model is intended for use in tasks that require the identification and categorization of named entities within Japanese text. It is suitable for various applications in natural language processing where understanding the specific names of people, organizations, locations, etc., is crucial.

How to Use

You can use this model for NER tasks with the following simple code snippet:

from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model_name = "knosing/japanese_ner_model"
tokenizer = AutoTokenizer.from_pretrained("tohoku-nlp/bert-base-japanese-v3")
model = AutoModelForTokenClassification.from_pretrained(model_name)

Model Performance

The model has been evaluated on various entity types to assess its precision, recall, F1 score, and overall accuracy. Below is the detailed performance breakdown by entity type:

Overall Metrics

Overall Precision: 0.8379
Overall Recall: 0.8477
Overall F1 Score: 0.8428
Overall Accuracy: 0.9684

Performance by Entity Type

Other Organization Names (の他の組織名):
- Precision: 0.71875
- Recall: 0.69
- F1 Score: 0.7041
- Sample Count: 100
Event Names (ベント名):
- Precision: 0.85
- Recall: 0.8586
- F1 Score: 0.8543
- Sample Count: 99
Personal Names (人名):
- Precision: 0.8171
- Recall: 0.8664
- F1 Score: 0.8410
- Sample Count: 232
Generic Names (名):
- Precision: 0.8986
- Recall: 0.9376
- F1 Score: 0.9177
- Sample Count: 529
Product Names (品名):
- Precision: 0.6522
- Recall: 0.5906
- F1 Score: 0.6198
- Sample Count: 127
Government Organization Names (治的組織名):
- Precision: 0.9160
- Recall: 0.8276
- F1 Score: 0.8696
- Sample Count: 145
Facility Names (設名):
- Precision: 0.7905
- Recall: 0.8357
- F1 Score: 0.8125
- Sample Count: 140

Note

You might not able to use the model with huggingface Inference API. The intended use for the model is given in the following repository: KeshavSingh29/fa_ner_japanese If you have any questions, please feel free to contact me or raise an issue at the above repo.