File size: 1,845 Bytes
d20d355
 
 
 
 
 
 
 
 
 
 
 
45e9490
 
d6f16cf
45e9490
 
3b4eaa2
45e9490
 
 
 
 
 
8596981
8a6430b
45e9490
 
 
 
 
 
 
 
 
 
 
 
 
 
a2a4014
 
 
 
 
 
 
 
0bab53e
45e9490
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
language: vi
datasets:
- cc100
tags:
- summarization
- translation
- question-answering

license: mit
---

# ViT5-large

State-of-the-art pretrained Transformer-based encoder-decoder model for Vietnamese.

## How to use
For more details, do check out [our Github repo](https://github.com/vietai/ViT5). 
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("VietAI/vit5-large")  
model = AutoModelForSeq2SeqLM.from_pretrained("VietAI/vit5-large")

sentence = "VietAI là tổ chức phi lợi nhuận với sứ mệnh ươm mầm tài năng về trí tuệ nhân tạo và xây dựng một cộng đồng các chuyên gia trong lĩnh vực trí tuệ nhân tạo đẳng cấp quốc tế tại Việt Nam."
text =  "vi: " + sentence
encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")
outputs = model.generate(
    input_ids=input_ids, attention_mask=attention_masks,
    max_length=256,
    early_stopping=True
)
for output in outputs:
    line = tokenizer.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
    print(line)
```

## Citation
```
@inproceedings{phan-etal-2022-vit5,
    title = "{V}i{T}5: Pretrained Text-to-Text Transformer for {V}ietnamese Language Generation",
    author = "Phan, Long and Tran, Hieu and Nguyen, Hieu and Trinh, Trieu H.",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop",
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-srw.18",
    pages = "136--142",
}
```