|
--- |
|
language: |
|
- pt |
|
thumbnail: Portuguese T5 for the Legal Domain |
|
tags: |
|
- transformers |
|
license: mit |
|
pipeline_tag: summarization |
|
--- |
|
|
|
|
|
[![INESC-ID](https://www.inesc-id.pt/wp-content/uploads/2019/06/INESC-ID-logo_01.png)](https://www.inesc-id.pt/projects/PR07005/) |
|
|
|
[![A Semantic Search System for Supremo Tribunal de Justiça](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/_static/logo.png)](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/) |
|
|
|
Work developed as part of [Project IRIS](https://www.inesc-id.pt/projects/PR07005/). |
|
|
|
Thesis: [A Semantic Search System for Supremo Tribunal de Justiça](https://rufimelo99.github.io/SemanticSearchSystemForSTJ/) |
|
|
|
# stjiris/t5-portuguese-legal-summarization |
|
|
|
T5 Model fine-tuned over “unicamp-dl/ptt5-base-portuguese-vocab” t5 model. |
|
|
|
We utilized various jurisprudence and its summary to train this model. |
|
|
|
|
|
## Usage (HuggingFace transformers) |
|
```python |
|
# name of folder principal |
|
from transformers import T5Tokenizer, T5ForConditionalGeneration |
|
|
|
model_checkpoint = "stjiris/t5-portuguese-legal-summarization" |
|
t5_model = T5ForConditionalGeneration.from_pretrained(model_checkpoint) |
|
t5_tokenizer = T5Tokenizer.from_pretrained(model_checkpoint) |
|
|
|
preprocess_text = "These are some big words and text and words and text, again, that we want to summarize" |
|
t5_prepared_Text = "summarize: "+preprocess_text |
|
#print ("original text preprocessed: \n", preprocess_text) |
|
|
|
tokenized_text = t5_tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device) |
|
|
|
|
|
# summmarize |
|
summary_ids = t5_model.generate(tokenized_text, |
|
num_beams=4, |
|
no_repeat_ngram_size=2, |
|
min_length=512, |
|
max_length=1024, |
|
early_stopping=True) |
|
|
|
output = t5_tokenizer.decode(summary_ids[0], skip_special_tokens=True) |
|
|
|
print ("\n\nSummarized text: \n",output) |
|
|
|
``` |
|
|
|
## Citing & Authors |
|
|
|
### Contributions |
|
[@rufimelo99](https://github.com/rufimelo99) |
|
|
|
If you use this work, please cite: |
|
|
|
```bibtex |
|
@inproceedings{MeloSemantic, |
|
author = {Melo, Rui and Santos, Professor Pedro Alexandre and Dias, Professor Jo{\~ a}o}, |
|
title = {A {Semantic} {Search} {System} for {Supremo} {Tribunal} de {Justi}{\c c}a}, |
|
} |
|
|
|
@article{ptt5_2020, |
|
title={PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data}, |
|
author={Carmo, Diedre and Piau, Marcos and Campiotti, Israel and Nogueira, Rodrigo and Lotufo, Roberto}, |
|
journal={arXiv preprint arXiv:2008.09144}, |
|
year={2020} |
|
} |
|
|
|
``` |