--- language: - pt thumbnail: Portuguese T5 for the Legal Domain tags: - transformers license: mit pipeline_tag: summarization --- [![INESC-ID](https://www.inesc-id.pt/wp-content/uploads/2019/06/INESC-ID-logo_01.png)](https://www.inesc-id.pt/projects/PR07005/) [![A Semantic Search System for Supremo Tribunal de Justiça](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/_static/logo.png)](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/) Work developed as part of [Project IRIS](https://www.inesc-id.pt/projects/PR07005/). Thesis: [A Semantic Search System for Supremo Tribunal de Justiça](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/) # stjiris/t5-portuguese-legal-summarization T5 Model fine-tuned over “unicamp-dl/ptt5-base-portuguese-vocab” t5 model. We utilized various jurisprudence and its summary to train this model. ## Usage (HuggingFace transformers) ```python # name of folder principal from transformers import T5Tokenizer, T5ForConditionalGeneration model_checkpoint = "t5_summ_model" t5_model = T5ForConditionalGeneration.from_pretrained(model_checkpoint) t5_tokenizer = T5Tokenizer.from_pretrained(model_checkpoint) preprocess_text = "These are some big words and text and words and text, again, that we want to summarize" t5_prepared_Text = "summarize: "+preprocess_text #print ("original text preprocessed: \n", preprocess_text) tokenized_text = t5_tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device) # summmarize summary_ids = t5_model.generate(tokenized_text, num_beams=4, no_repeat_ngram_size=2, min_length=512, max_length=1024, early_stopping=True) output = t5_tokenizer.decode(summary_ids[0], skip_special_tokens=True) print ("\n\nSummarized text: \n",output) ``` ## Citing & Authors ### Contributions [@rufimelo99](https://github.com/rufimelo99) If you use this work, please cite: ```bibtex @inproceedings{MeloSemantic, author = {Melo, Rui and Santos, Professor Pedro Alexandre and Dias, Professor Jo{\~ a}o}, title = {A {Semantic} {Search} {System} for {Supremo} {Tribunal} de {Justi}{\c c}a}, } @article{ptt5_2020, title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data}, author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto}, journal={arXiv preprint arXiv:2008.09144}, year={2020} } ```