bertina-3M is an italian foundational model based on bert, pretrained from scratch on 30GB of italian Wikipedia articles (10M sentences, 329M tokens).

It has 3M parameters and uses a 512 context window size.

The project is still a work in progress, new versions will come with time.

Use it as a foundational model to be finetuned on specific italian tasks.

Training

  • epochs: 4

  • lr: 4e-4

  • optim: AdamW (beta_1=0.8)

  • weight_decay: 1e-2

  • Dev set perplexity: 19 (it's a 12MB model!)

Evaluation (UINAUIL)

Following the UINAUIL setup we can summarise the following results on BERTINA-3M:

CLASSIFICATION TASKS

task,type,p,r,f1,acc
haspeede,classification,0.699,0.687,0.680,0.685
ironita,classification,0.701,0.701,0.701,0.701
sentipolc,classification,0.649,0.588,0.587,0.560

ENTAILMENT TASKS

task,type,p,r,f1,acc
textualentailment,entailment,0.423,0.530,0.401,0.530

SEQUENCE TASKS

task,type,acc
eventi,NER,0.835
facta,NER,0.967

License

BERTINA-3M can be freely used for research and commercial purposes.

Citation

If you're using BERTINA-3M in your scientific work, please cite with:

@misc{
  Sciancalepore,
  title={mascit/bertina-3M},
  url={https://huggingface.co/mascIT/bertina-3M},
  journal={mascIT/bertina-3M · Hugging Face},
  publisher={mascIT},
  author={Sciancalepore, Mauro}
} 
Downloads last month
33
Safetensors
Model size
3.01M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train mascIT/bertina-3M