dlicari's picture
Update README.md
8adc56b
|
raw
history blame
690 Bytes
metadata
license: afl-3.0
language:
  - it

LSG16K-Italian-LEGAL-BERT

Local-Sparse-Global version of ITALIAN-LEGAL-BERT-SC by replacing the full attention in the encoder part using the LSG converter script (https://github.com/ccdv-ai/convert\_checkpoint\_to\_lsg). We used the LSG attention with 16,384 maximum sequence length, 7 global tokens, 128 local block size, 128 sparse block size, 2 sparsity factors, 'norm' sparse selection pattern (select the highest norm tokens).