T5 for Chinese Couplet(t5-chinese-couplet) Model
T5中文对联生成模型
t5-chinese-couplet
evaluate couplet test data:
The overall performance of T5 on couplet test:
prefix | input_text | target_text | pred |
---|---|---|---|
对联: | 春回大地,对对黄莺鸣暖树 | 日照神州,群群紫燕衔新泥 | 福至人间,家家紫燕舞和风 |
在Couplet测试集上生成结果满足字数相同、词性对齐、词面对齐、形似要求,而语义对仗工整和平仄合律还不满足。
T5的网络结构(原生T5):
Usage
本项目开源在文本生成项目:textgen,可支持T5模型,通过如下命令调用:
Install package:
pip install -U textgen
from textgen import T5Model
model = T5Model("t5", "shibing624/t5-chinese-couplet")
r = model.predict(["对联:丹枫江冷人初去"])
print(r) # ['白石矶寒客不归']
Usage (HuggingFace Transformers)
Without textgen, you can use the model like this:
First, you pass your input through the transformer model, then you get the generated sentence.
Install package:
pip install transformers
from transformers import T5ForConditionalGeneration, T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("shibing624/t5-chinese-couplet")
model = T5ForConditionalGeneration.from_pretrained("shibing624/t5-chinese-couplet")
def batch_generate(input_texts, max_length=64):
features = tokenizer(input_texts, return_tensors='pt')
outputs = model.generate(input_ids=features['input_ids'],
attention_mask=features['attention_mask'],
max_length=max_length)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
r = batch_generate(["对联:丹枫江冷人初去"])
print(r)
output:
['白石矶寒客不归']
模型文件组成:
t5-chinese-couplet
├── config.json
├── model_args.json
├── pytorch_model.bin
├── special_tokens_map.json
├── tokenizer_config.json
├── spiece.model
└── vocab.txt
训练数据集
中文对联数据集
- 数据:对联github、清洗过的对联github
- 相关内容
- Huggingface
- LangZhou Chinese MengZi T5 pretrained Model and paper
- textgen
数据格式:
head -n 1 couplet_files/couplet/train/in.txt
晚 风 摇 树 树 还 挺
head -n 1 couplet_files/couplet/train/out.txt
晨 露 润 花 花 更 红
Citation
@software{textgen,
author = {Xu Ming},
title = {textgen: Implementation of Text Generation models},
year = {2022},
url = {https://github.com/shibing624/textgen},
}
- Downloads last month
- 28
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.