File size: 2,223 Bytes
751be9e f480fc2 df1d5aa 9e2212b f480fc2 774a2c8 a347d8b 774a2c8 3715def d12860a 774a2c8 3715def 774a2c8 d12860a 824e6bf 774a2c8 ab93ca6 d012232 ac545ce ab93ca6 1090b08 04893f2 5d4a67f 1090b08 ab93ca6 85e7694 ab93ca6 85e7694 e000310 71bb42a e000310 85e7694 fa9d65b b5f5536 d4e5d7d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
language:
- zh
thumbnail: "url to a thumbnail used in social sharing"
tags:
- bart-large-chinese
datasets:
- lccc
- kd_conv
---
# dialogue-bart-large-chinese
This is a seq2seq model pre-trained on several Chinese dialogue datasets, from bart-large-chinese. It's better to fine-tune it on downstream tasks for better performance.
# Spaces
Now you can experience our model on HuggingFace Spaces [HIT-TMG/dialogue-bart-large-chinese](https://huggingface.co/spaces/HIT-TMG/dialogue-bart-large-chinese) .
# Datasets
We utilize 4 Chinese dialogue datasets from [LUGE](https://www.luge.ai/#/) .
| | | |
| ---- | ---- | ---- |
| | Count | Domain |
| Chinese Persona Chat (CPC) | 23,000 | Open |
| LCCC | 11,987,759 | Open |
| Emotional STC (ESTC) | 899,207 | Open |
| KdConv | 3,000 | Movie, Music, Travel |
| | | |
# Data format
Input: `[CLS] 对话历史:<history> [SEP] 知识:<knowledge> [SEP]`
Output: `[CLS] <response> [SEP]`
# Example
```python
from transformers import BertTokenizer, BartForConditionalGeneration
# Note that tokenizer is an object of BertTokenizer, instead of BartTokenizer
tokenizer = BertTokenizer.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")
model = BartForConditionalGeneration.from_pretrained("HIT-TMG/dialogue-bart-large-chinese")
# an example from CPC dev data
history = ["可以 认识 一下 吗 ?", "当然 可以 啦 , 你好 。", "嘿嘿 你好 , 请问 你 最近 在 忙 什么 呢 ?", "我 最近 养 了 一只 狗狗 , 我 在 训练 它 呢 。"]
history_str = "对话历史:" + tokenizer.sep_token.join(history)
input_ids = tokenizer(history_str, return_tensors='pt').input_ids
output_ids = model.generate(input_ids)[0]
print(tokenizer.decode(output_ids, skip_special_tokens=True))
```
# Contact
If you encounter any issue, feel free to contact us via the email: <u>yanshekwoo@foxmail.com</u> |