File size: 1,214 Bytes
fa540d6 db65d3d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
license: apache-2.0
---
# CSC T5 - T5 for Traditional Chinese Spelling Correction
This model was obtained by `instruction-tuning` the corresponding `ClueAI/PromptCLUE-base-v1-5` model on the spelling error corpus.
## Model Details
### Model Description
- Language(s) (NLP): `Chinese`
- Pretrained from model: `ClueAI/PromptCLUE-base-v1-5`
- Pretrained by dataset: `1M UDN news corpus`
- Finetuned by dataset: `shibing624/CSC` spelling error corpus
### Model Sources
- Repository: [https://github.com/TedYeh/Chinese_spelling_Correction](https://github.com/TedYeh/Chinese_spelling_Correction)
## Usage
```python
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("CodeTed/traditional_CSC_t5")
model = T5ForConditionalGeneration.from_pretrained("CodeTed/traditional_CSC_t5")
input_text = '糾正句子裡的錯字: 為了降低少子化,政府可以堆動獎勵生育的政策。'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=256)
edited_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
```
### Related Project
[CodeTed/CGEDit](https://huggingface.co/CodeTed/CGEDit) |