--- datasets: - unicamp-dl/mmarco language: - pt pipeline_tag: text2text-generation base_model: unicamp-dl/ptt5-v2-large --- ## Introduction MonoPTT5 models are T5 rerankers for the Portuguese language. Starting from [ptt5-v2 checkpoints](https://huggingface.co/collections/unicamp-dl/ptt5-v2-666538a650188ba00aa8d2d0), they were trained for 100k steps on a mixture of Portuguese and English data from the mMARCO dataset. For further information on the training and evaluation of these models, please refer to our paper, [ptt5-v2: A Closer Look at Continued Pretraining of T5 Models for the Portuguese Language](https://arxiv.org/abs/2008.09144). ## Usage The easiest way to use our models is through the `rerankers` package. After installing the package using `pip install rerankers[transformers]`, the following code can be used as a minimal working example: ```python from rerankers import Reranker import torch query = "O futebol é uma paixão nacional" docs = [ "O futebol é superestimado e não deveria receber tanta atenção.", "O futebol é uma parte essencial da cultura brasileira e une as pessoas.", ] ranker = Reranker( "unicamp-dl/monoptt5-large", inputs_template="Pergunta: {query} Documento: {text} Relevante:", dtype=torch.float32 # or bfloat16 if supported by your GPU ) results = ranker.rank(query, docs) print("Classification results:") for result in results: print(result) # Loading T5Ranker model unicamp-dl/monoptt5-large # No device set # Using device cuda # Using dtype torch.float32 # Loading model unicamp-dl/monoptt5-large, this might take a while... # Using device cuda. # Using dtype torch.float32. # T5 true token set to ▁Sim # T5 false token set to ▁Não # Returning normalised scores... # Inputs template set to Pergunta: {query} Documento: {text} Relevante: # Classification results: # document=Document(text='O futebol é uma parte essencial da cultura brasileira e une as pessoas.', doc_id=1, metadata={}) score=0.923164963722229 rank=1 # document=Document(text='O futebol é superestimado e não deveria receber tanta atenção.', doc_id=0, metadata={}) score=0.08710747957229614 rank=2 ``` For additional configurations and more advanced usage, consult the `rerankers` [GitHub repository](https://github.com/AnswerDotAI/rerankers). # Citation If you use our models, please cite: @article{ptt5_2020, title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data}, author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto}, journal={arXiv preprint arXiv:2008.09144}, year={2020} }