File size: 3,221 Bytes
88a291e 566dbff 88a291e f9b5165 88a291e 89e2777 8c90089 8d866c0 c3b687e 8d866c0 8f704e3 3c27e83 8f704e3 3c27e83 56a273a 69d53a2 3c27e83 0039f4f e5abd54 ebd60af e5abd54 8bbcc74 c15bf0c 8bbcc74 018976c 8435fda 8b7fb7f 8435fda 611d153 8435fda 8bbcc74 f9f3af3 8bbcc74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
language: it
license: apache-2.0
widget:
- text: "Il [MASK] ha chiesto revocarsi l'obbligo di pagamento"
---
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT.jpg" width="600"/>
<h1> ITALIAN-LEGAL-BERT:A pre-trained Transformer Language Model for Italian Law </h1>
ITALIAN-LEGAL-BERT is based on <a href="https://huggingface.co/dbmdz/bert-base-italian-xxl-cased">bert-base-italian-xxl-cased</a> with additional pre-training of the Italian BERT model on Italian civil law corpora.
It achieves better results than the ‘general-purpose’ Italian BERT in different domain-specific tasks.
<h2>Training procedure</h2>
We initialized ITALIAN-LEGAL-BERT with ITALIAN XXL BERT
and pretrained for an additional 4 epochs on 3.7 GB of preprocessed text from the National Jurisprudential
Archive using the Huggingface PyTorch-Transformers library. We used BERT architecture
with a language modeling head on top, AdamW Optimizer, initial learning rate 5e-5 (with
linear learning rate decay, ends at 2.525e-9), sequence length 512, batch size 10 (imposed
by GPU capacity), 8.4 million training steps, device 1*GPU V100 16GB
<p />
<h2> Usage </h2>
ITALIAN-LEGAL-BERT model can be loaded like:
```python
from transformers import AutoModel, AutoTokenizer
model_name = "dlicari/Italian-Legal-BERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
```
You can use the Transformers library fill-mask pipeline to do inference with ITALIAN-LEGAL-BERT.
```python
from transformers import pipeline
model_name = "dlicari/Italian-Legal-BERT"
fill_mask = pipeline("fill-mask", model_name)
fill_mask("Il [MASK] ha chiesto revocarsi l'obbligo di pagamento")
#[{'sequence': "Il ricorrente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.7264330387115479},
# {'sequence': "Il convenuto ha chiesto revocarsi l'obbligo di pagamento",'score': 0.09641049802303314},
# {'sequence': "Il resistente ha chiesto revocarsi l'obbligo di pagamento",'score': 0.039877112954854965},
# {'sequence': "Il lavoratore ha chiesto revocarsi l'obbligo di pagamento",'score': 0.028993653133511543},
# {'sequence': "Il Ministero ha chiesto revocarsi l'obbligo di pagamento", 'score': 0.025297977030277252}]
```
In this [COLAB: ITALIAN-LEGAL-BERT: Minimal Start for Italian Legal Downstream Tasks](https://colab.research.google.com/drive/1aXOmqr70fjm8lYgIoGJMZDsK0QRIL4Lt?usp=sharing)
how to use it for sentence similarity, sentence classification, and named entity recognition
- https://colab.research.google.com/drive/1aXOmqr70fjm8lYgIoGJMZDsK0QRIL4Lt?usp=sharing
<img src="https://huggingface.co/dlicari/Italian-Legal-BERT/resolve/main/semantic_text_similarity.jpg" width="700"/>
<h2> Citation </h2>
If you find our resource or paper is useful, please consider including the following citation in your paper.
```
@article{ita_legalbert_2022,
author = {Daniele Licari and Giovanni Comandè},
title = {ITALIAN-LEGAL-BERT: A Pre-trained Transformer
Language Model for Italian Law},
booktitle = {Proceedings of The Knowledge Management for Law Workshop (KM4LAW)}
note = {Accepted for publication},
year = {2022}
}
``` |