TCMNER
Model description
TCMNER is a fine-tuned BERT model that is ready to use for Named Entity Recognition of Traditional Chinese Medicine and achieves state-of-the-art performance for the NER task. It has been trained to recognize six types of entities: prescription (方剂), herb (本草), source (来源), disease (病名), symptom (症状) and syndrome(证型).
Specifically, this model is a TCMRoBERTa model, a fine-tuned model of RoBERTa for Traditional Chinese medicine, that was fine-tuned on the Chinese version of the Haiwei AI Lab's Named Entity Recognition dataset.
Currently, TCMRoBERTa is just a closed-source model for my own company and will be open-source in the future.
How to use
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Monor/TCMNER")
model = AutoModelForTokenClassification.from_pretrained("Monor/TCMNER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。"
ner_results = nlp(example)
print(ner_results)
Training data
This model was fine-tuned on MY DATASET.
Abbreviation | Description |
---|---|
O | Outside of a named entity |
B-方剂 | Beginning of a prescription entity right after another prescription entity |
I-方剂 | Prescription entity |
B-本草 | Beginning of a herb entity right after another herb entity |
I-本草 | Herb entity |
B-来源 | Beginning of a source of prescription right after another source of prescription |
I-来源 | Source entity |
B-病名 | Beginning of a disease's name right after another disease's name |
I-病名 | Disease's name |
B-症状 | Beginning of a symptom right after another symptom |
I-症状 | Symptom |
B-证型 | Beginning of a syndrome right after another syndrome |
I-证型 | Syndrome |
Eval results
Notices
- The model is commercially available for free.
- I am not going to write a paper about this model, if you use any details in your paper, please mention it, thanks.
Bonus
All of our TCM domain models will be open-sourced soon, including:
- A series of pre-trained models
- Named entity recognition for TCM
- Text localization in ancient images
- OCR for ancient images
And so on
- Downloads last month
- 8