dlicari
/

lsg16k-Italian-Legal-BERT-SC

Model card Files Files and versions Community

dlicari commited on Feb 18, 2023

Commit

8adc56b

·

1 Parent(s): 68476e9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,4 +6,4 @@ language:
 <img  src="https://huggingface.co/dlicari/lsg16k-Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT-LSG.jpg" width="600"/>
 # LSG16K-Italian-LEGAL-BERT
-[Local-Sparse-Global](https://arxiv.org/abs/2210.15497) version of [ITALIAN-LEGAL-BERT-SC](https://huggingface.co/dlicari/Italian-Legal-BERT) by replacing the full attention in the encoder part using the LSG converter script (https://github.com/ccdv-ai/convert\_checkpoint\_to\_lsg). We used the LSG attention with 16,384 maximum sequence length, 7 global tokens, 128 local block size, 128 sparse block size, 2 sparsity factors, 'norm' sparse selection pattern (select the highest norm tokens).

 <img  src="https://huggingface.co/dlicari/lsg16k-Italian-Legal-BERT/resolve/main/ITALIAN_LEGAL_BERT-LSG.jpg" width="600"/>
 # LSG16K-Italian-LEGAL-BERT
+[Local-Sparse-Global](https://arxiv.org/abs/2210.15497) version of [ITALIAN-LEGAL-BERT-SC](https://huggingface.co/dlicari/Italian-Legal-BERT-SC) by replacing the full attention in the encoder part using the LSG converter script (https://github.com/ccdv-ai/convert\_checkpoint\_to\_lsg). We used the LSG attention with 16,384 maximum sequence length, 7 global tokens, 128 local block size, 128 sparse block size, 2 sparsity factors, 'norm' sparse selection pattern (select the highest norm tokens).