TedYeh
update csc t5 model
db65d3d
|
raw
history blame
1.21 kB
metadata
license: apache-2.0

CSC T5 - T5 for Traditional Chinese Spelling Correction

This model was obtained by instruction-tuning the corresponding ClueAI/PromptCLUE-base-v1-5 model on the spelling error corpus.

Model Details

Model Description

  • Language(s) (NLP): Chinese
  • Pretrained from model: ClueAI/PromptCLUE-base-v1-5
  • Pretrained by dataset: 1M UDN news corpus
  • Finetuned by dataset: shibing624/CSC spelling error corpus

Model Sources

Usage

from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("CodeTed/traditional_CSC_t5")
model = T5ForConditionalGeneration.from_pretrained("CodeTed/traditional_CSC_t5")
input_text = '糾正句子裡的錯字: 為了降低少子化,政府可以堆動獎勵生育的政策。'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=256)
edited_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

Related Project

CodeTed/CGEDit