File size: 3,867 Bytes
88a291e 566dbff 88a291e f9b5165 88a291e 89e2777 8c90089 8d866c0 c3b687e 8d866c0 8f704e3 3c27e83 8f704e3 3c27e83 56a273a 69d53a2 3c27e83 0039f4f e5abd54 ebd60af e5abd54 8bbcc74 c15bf0c 8bbcc74 3905996 8bbcc74 f9f3af3 8bbcc74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
language: it
license: apache-2.0
widget:
- text: "Il [MASK] ha chiesto revocarsi l'obbligo di pagamento"
---
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT.jpg" width="600"/>
<h1> ITALIAN-LEGAL-BERT:A pre-trained Transformer Language Model for Italian Law </h1>
ITALIAN-LEGAL-BERT is based on <a href="https://huggingface.co/dbmdz/bert-base-italian-xxl-cased">bert-base-italian-xxl-cased</a> with additional pre-training of the Italian BERT model on Italian civil law corpora.
It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.
<h2>Training procedure</h2>
We initialized ITALIAN-LEGAL-BERT with ITALIAN XXL BERT
and pretrained for an additional 4 epochs on 3.7 GB of preprocessed text from the National Jurisprudential
Archive using the Huggingface PyTorch-Transformers library. We used BERT architecture
with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5 (with
linear learning rate decay, ends at 2.525e-9), sequence length 512, batch size 10 (imposed
by GPU capacity), 8.4 million training steps, device 1*GPU V100 16GB
<p />
<h2> Usage </h2>
ITALIAN-LEGAL-BERT model can be loaded like:
```python
from transformers import AutoModel, AutoTokenizer
model_name = "dlicari/Italian-Legal-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
```
You can use the Transformers library fill-mask pipeline to do inference with ITALIAN-LEGAL-BERT.
```python
from transformers import pipeline
model_name = "dlicari/Italian-Legal-BERT"
fill_mask = pipeline("fill-mask", model_name)
fill_mask("Il [MASK] ha chiesto revocarsi l'obbligo di pagamento")
#[{'sequence': "Il ricorrente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.7264330387115479},
# {'sequence': "Il convenuto ha chiesto revocarsi l'obbligo di pagamento",'score': 0.09641049802303314},
# {'sequence': "Il resistente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.039877112954854965},
# {'sequence': "Il lavoratore ha chiesto revocarsi l'obbligo di pagamento",'score': 0.028993653133511543},
# {'sequence': "Il Ministero ha chiesto revocarsi l'obbligo di pagamento", 'score': 0.025297977030277252}]
```
In this [COLAB: ITALIAN-LEGAL-BERT: Minimal Start for Italian Legal Downstream Tasks](https://colab.research.google.com/drive/1aXOmqr70fjm8lYgIoGJMZDsK0QRIL4Lt?usp=sharing)
how to use it for sentence similarity, sentence classification, and named entity recognition
- https://colab.research.google.com/drive/1aXOmqr70fjm8lYgIoGJMZDsK0QRIL4Lt?usp=sharing
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/semantic_text_similarity.jpg" width="700"/>
<h2> Citation </h2>
If you find our resource or paper is useful, please consider including the following citation in your paper.
```
@inproceedings{licari_italian-legal-bert_2022,
address = {Bozen-Bolzano, Italy},
series = {{CEUR} {Workshop} {Proceedings}},
title = {{ITALIAN}-{LEGAL}-{BERT}: {A} {Pre}-trained {Transformer} {Language} {Model} for {Italian} {Law}},
volume = {3256},
shorttitle = {{ITALIAN}-{LEGAL}-{BERT}},
url = {https://ceur-ws.org/Vol-3256/#km4law3},
language = {en},
urldate = {2022-11-19},
booktitle = {Companion {Proceedings} of the 23rd {International} {Conference} on {Knowledge} {Engineering} and {Knowledge} {Management}},
publisher = {CEUR},
author = {Licari, Daniele and Comandè, Giovanni},
editor = {Symeonidou, Danai and Yu, Ran and Ceolin, Davide and Poveda-Villalón, María and Audrito, Davide and Caro, Luigi Di and Grasso, Francesca and Nai, Roberto and Sulis, Emilio and Ekaputra, Fajar J. and Kutz, Oliver and Troquard, Nicolas},
month = sep,
year = {2022},
note = {ISSN: 1613-0073},
file = {Full Text PDF:https://ceur-ws.org/Vol-3256/km4law3.pdf},
}
``` |