Text2Text Generation
Transformers
PyTorch
Portuguese
t5
text-generation-inference
Inference Endpoints
marcospiau commited on
Commit
68387dd
1 Parent(s): fccd881

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/c4
4
+ - legacy-datasets/mc4
5
+ language:
6
+ - pt
7
+ pipeline_tag: text2text-generation
8
+ base_model: google-t5/t5-large
9
+ ---
10
+
11
+ # ptt5-v2-large
12
+
13
+ ## Introduction
14
+ [ptt5-v2 models](https://huggingface.co/collections/unicamp-dl/ptt5-v2-666538a650188ba00aa8d2d0) are pretrained T5 models tailored for the Portuguese language, continuing from Google's original checkpoints with sizes from t5-small to t5-3B.
15
+ These checkpoints were used to train MonoT5 rerankers for the Portuguese language, which can be found in their [HuggingFace collection](https://huggingface.co/collections/unicamp-dl/monoptt5-66653981877df3ea727f720d).
16
+ For further information about the pretraining process, please refer to our paper, [ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language](https://arxiv.org/abs/2008.09144).
17
+
18
+ ## Usage
19
+ ```python
20
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
21
+
22
+ tokenizer = T5Tokenizer.from_pretrained("unicamp-dl/ptt5-v2-large")
23
+ model = T5ForConditionalGeneration.from_pretrained("unicamp-dl/ptt5-v2-large")
24
+ ```
25
+
26
+ ## Citation
27
+ If you use our models, please cite:
28
+
29
+ @article{ptt5_2020,
30
+ title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data},
31
+ author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto},
32
+ journal={arXiv preprint arXiv:2008.09144},
33
+ year={2020}
34
+ }