Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Punctuator for Simplified Chinese

The model is fine-tuned based on DistilBertForTokenClassification for adding punctuations to plain text (simplified Chinese). The model is fine-tuned based on distilled model bert-base-chinese.

Usage

from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

model = DistilBertForTokenClassification.from_pretrained("Qishuai/distilbert_punctuator_zh")
tokenizer = DistilBertTokenizerFast.from_pretrained("Qishuai/distilbert_punctuator_zh")

Model Overview

Training data

Combination of following three dataset:

  • News articles of People's Daily 2014. Reference

Model Performance

  • Validation with MSRA training dataset. Reference
  • Metrics Report:
    precision recall f1-score support
    C_COMMA 0.67 0.59 0.63 91566
    C_DUNHAO 0.50 0.37 0.42 21013
    C_EXLAMATIONMARK 0.23 0.06 0.09 399
    C_PERIOD 0.84 0.99 0.91 44258
    C_QUESTIONMARK 0.00 1.00 0.00 0
    micro avg 0.71 0.67 0.69 157236
    macro avg 0.45 0.60 0.41 157236
    weighted avg 0.69 0.67 0.68 157236
Downloads last month
41
Safetensors
Model size
59.2M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.