SMM4H-2024 Task 2 Japanese NER

Overview

This is a named entity extraction model created by fine-tuning daisaku-s/medtxt_ner_roberta on SMM4H 2024 Task 2a corpus.

Tag set (IOB2 format):

  • DRUG
  • DISORDER
  • FUNCTION

Usage

from transformers import BertForTokenClassification, AutoTokenizer

import torch
text = "銈点兂銉椼儷銉嗐偔銈广儓"
model_name = "yseop/SMM4H2024_Task2a_ja"
with torch.inference_mode():
    model = BertForTokenClassification.from_pretrained(model_name).eval()
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    idx2tag = model.config.id2label
    vecs = tokenizer(text, 
                     padding=True, 
                     truncation=True, 
                     return_tensors="pt")
    ner_logits = model(input_ids=vecs["input_ids"], 
                       attention_mask=vecs["attention_mask"])
    idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
    token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
    pred_tag = [idx2tag[x] for x in idx][1:-1]

Results

NE tp fp fn precision recall f1
DISORDER 588 409 330 0.5898 0.6405 0.6141
DRUG 307 143 169 0.6822 0.645 0.6631
FUNCTION 69 160 170 0.3013 0.2887 0.2949
all 964 712 669 0.5752 0.5903 0.5827
Downloads last month
9
Safetensors
Model size
124M params
Tensor type
F32
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.