File size: 5,262 Bytes
88a291e 566dbff 2d991dd f9b5165 2d991dd 88a291e 89e2777 8c90089 8d866c0 c3b687e 8d866c0 8f704e3 358b3e4 274b5ab e748637 bf5c00b 358b3e4 bf5c00b 358b3e4 bf5c00b a16e4d9 117d1b1 3fa4757 8f704e3 3c27e83 8f704e3 3c27e83 56a273a 69d53a2 3c27e83 0039f4f e5abd54 911929b ebd60af 911929b ebd60af e5abd54 8bbcc74 c15bf0c 8bbcc74 3905996 8bbcc74 f9f3af3 8bbcc74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
language: it
license: afl-3.0
widget:
- text: Il [MASK] ha chiesto revocarsi l'obbligo di pagamento
---
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT.jpg" width="600"/>
<h1> ITALIAN-LEGAL-BERT:A pre-trained Transformer Language Model for Italian Law </h1>
ITALIAN-LEGAL-BERT is based on <a href="https://huggingface.co/dbmdz/bert-base-italian-xxl-cased">bert-base-italian-xxl-cased</a> with additional pre-training of the Italian BERT model on Italian civil law corpora.
It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.
<b>ITALIAN-LEGAL-BERT variants [NEW!!!]</b>
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT-SC/resolve/main/ITALIAN_LEGAL_BERT-SC.jpg" width="600"/>
* <a href="https://huggingface.co/dlicari/Italian-Legal-BERT-SC">FROM SCRATCH</a>, It is the ITALIAN-LEGAL-BERT variant pre-trained from scratch on Italian legal documents (<a href="https://huggingface.co/dlicari/Italian-Legal-BERT-SC">ITA-LEGAL-BERT-SC</a>) based on the CamemBERT architecture
<img src="https://huggingface.co/dlicari/distil-ita-legal-bert/resolve/main/ITALIAN_LEGAL_BERT-DI.jpg" width="600"/>
* <a href="https://huggingface.co/dlicari/distil-ita-legal-bert">DISTILLED</a>, a distilled version of ITALIAN-LEGAL-BERT ( <a href="https://huggingface.co/dlicari/distil-ita-legal-bert">DISTIL-ITA-LEGAL-BERT</a>)
<img src="https://huggingface.co/dlicari/lsg16k-Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT-LSG.jpg" width="600"/>
For long documents
* [LSG ITA LEGAL BERT](https://huggingface.co/dlicari/lsg16k-Italian-Legal-BERT), Local-Sparse-Global version of ITALIAN-LEGAL-BERT (FURTHER PRETRAINED)
* [LSG ITA LEGAL BERT-SC](https://huggingface.co/dlicari/lsg16k-Italian-Legal-BERT-SC), Local-Sparse-Global version of ITALIAN-LEGAL-BERT-SC (FROM SCRATCH)
*Note: We are working on the extended version of the paper with more details and the results of these new models. We will update you soon*
<h2>Training procedure</h2>
We initialized ITALIAN-LEGAL-BERT with ITALIAN XXL BERT
and pretrained for an additional 4 epochs on 3.7 GB of preprocessed text from the National Jurisprudential
Archive using the Huggingface PyTorch-Transformers library. We used BERT architecture
with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5 (with
linear learning rate decay, ends at 2.525e-9), sequence length 512, batch size 10 (imposed
by GPU capacity), 8.4 million training steps, device 1*GPU V100 16GB
<p />
<h2> Usage </h2>
ITALIAN-LEGAL-BERT model can be loaded like:
```python
from transformers import AutoModel, AutoTokenizer
model_name = "dlicari/Italian-Legal-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
```
You can use the Transformers library fill-mask pipeline to do inference with ITALIAN-LEGAL-BERT.
```python
from transformers import pipeline
model_name = "dlicari/Italian-Legal-BERT"
fill_mask = pipeline("fill-mask", model_name)
fill_mask("Il [MASK] ha chiesto revocarsi l'obbligo di pagamento")
#[{'sequence': "Il ricorrente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.7264330387115479},
# {'sequence': "Il convenuto ha chiesto revocarsi l'obbligo di pagamento",'score': 0.09641049802303314},
# {'sequence': "Il resistente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.039877112954854965},
# {'sequence': "Il lavoratore ha chiesto revocarsi l'obbligo di pagamento",'score': 0.028993653133511543},
# {'sequence': "Il Ministero ha chiesto revocarsi l'obbligo di pagamento", 'score': 0.025297977030277252}]
```
In this [COLAB: ITALIAN-LEGAL-BERT: Minimal Start for Italian Legal Downstream Tasks](https://colab.research.google.com/drive/1ZOWaWnLaagT_PX6MmXMP2m3MAOVXkyRK?usp=sharing)
how to use it for sentence similarity, sentence classification, and named entity recognition
- https://colab.research.google.com/drive/1ZOWaWnLaagT_PX6MmXMP2m3MAOVXkyRK?usp=sharing
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/semantic_text_similarity.jpg" width="700"/>
<h2> Citation </h2>
If you find our resource or paper is useful, please consider including the following citation in your paper.
```
@inproceedings{licari_italian-legal-bert_2022,
address = {Bozen-Bolzano, Italy},
series = {{CEUR} {Workshop} {Proceedings}},
title = {{ITALIAN}-{LEGAL}-{BERT}: {A} {Pre}-trained {Transformer} {Language} {Model} for {Italian} {Law}},
volume = {3256},
shorttitle = {{ITALIAN}-{LEGAL}-{BERT}},
url = {https://ceur-ws.org/Vol-3256/#km4law3},
language = {en},
urldate = {2022-11-19},
booktitle = {Companion {Proceedings} of the 23rd {International} {Conference} on {Knowledge} {Engineering} and {Knowledge} {Management}},
publisher = {CEUR},
author = {Licari, Daniele and Comandè, Giovanni},
editor = {Symeonidou, Danai and Yu, Ran and Ceolin, Davide and Poveda-Villalón, María and Audrito, Davide and Caro, Luigi Di and Grasso, Francesca and Nai, Roberto and Sulis, Emilio and Ekaputra, Fajar J. and Kutz, Oliver and Troquard, Nicolas},
month = sep,
year = {2022},
note = {ISSN: 1613-0073},
file = {Full Text PDF:https://ceur-ws.org/Vol-3256/km4law3.pdf},
}
``` |