File size: 4,309 Bytes
1ced4b7 62dd987 1ced4b7 62dd987 1ced4b7 62dd987 375961d 62dd987 93b6134 54fee6e 62dd987 61184f9 62dd987 90f4cae 6284fca 90f4cae 62dd987 69aa348 92070db 15b1de2 62dd987 9f57e6c 69aa348 d987a1e 3961240 d987a1e 3961240 d987a1e 80c0538 62dd987 c6b7ead d987a1e 3961240 ee20fff 62dd987 61184f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
language: ca
license: apache-2.0
tags:
- "catalan"
- "masked-lm"
- "distilroberta"
widget:
- text: "El Català és una llengua molt <mask>."
- text: "Salvador Dalí va viure a <mask>."
- text: "La Costa Brava té les millors <mask> d'Espanya."
- text: "El cacaolat és un batut de <mask>."
- text: "<mask> és la capital de la Garrotxa."
- text: "Vaig al <mask> a buscar bolets."
- text: "Antoni Gaudí vas ser un <mask> molt important per la ciutat."
- text: "Catalunya és una referència en <mask> a nivell europeu."
---
# DistilRoBERTa-base-ca
## Overview
- **Architecture:** DistilRoBERTa-base
- **Language:** Catalan
- **Task:** Fill-Mask
- **Data:** Crawling
## Model description
This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2).
It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation
from the paper's [official repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads.
This adds up to a total of 82M parameters, which is considerably less than the 125M of standard RoBERTa-base models.
This makes the model lighter and faster than the original, at the cost of a slightly lower performance.
## Training
### Training procedure
This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
It basically consists in distilling a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
So, in a “teacher-student learning” setup, a relatively small student model is trained to mimic the behavior of a larger teacher model. As a result, the student has lower inference time and the ability to run in commodity hardware.
### Training data
The training corpus consists of several corpora gathered from web crawling and public corpora, as shown in the table below:
| Corpus | Size (GB) |
|--------------------------|------------|
| Catalan Crawling | 13.00 |
| RacoCatalá | 8.10 |
| Catalan Oscar | 4.00 |
| CaWaC | 3.60 |
| Cat. General Crawling | 2.50 |
| Wikipedia | 1.10 |
| DOGC | 0.78 |
| Padicat | 0.63 |
| ACN | 0.42 |
| Nació Digital | 0.42 |
| Cat. Government Crawling | 0.24 |
| Vilaweb | 0.06 |
| Catalan Open Subtitles | 0.02 |
| Tweets | 0.02 |
## Evaluation
### Evaluation benchmark
This model has been fine-tuned on the downstream tasks of the [Catalan Language Understanding Evaluation benchmark (CLUB)](https://club.aina.bsc.es/), which includes the following datasets:
| Dataset | Task| Total | Train | Dev | Test |
|:----------|:----|:--------|:-------|:------|:------|
| AnCora | NER | 13,581 | 10,628 | 1,427 | 1,526 |
| AnCora | POS | 16,678 | 13,123 | 1,709 | 1,846 |
| STS-ca | STS | 3,073 | 2,073 | 500 | 500 |
| TeCla | TC | 137,775 | 110,203| 13,786| 13,786|
| TE-ca | RTE | 21,163 | 16,930 | 2,116 | 2,117 |
| CatalanQA | QA | 21,427 | 17,135 | 2,157 | 2,135 |
| XQuAD-ca | QA | - | - | - | 1,189 |
### Evaluation results
This is how it compares to its teacher when fine-tuned on the aforementioned downstream tasks:
| Model \ Task |NER (F1)|POS (F1)|STS-ca (Comb.)|TeCla (Acc.)|TEca (Acc.)|CatalanQA (F1/EM)| XQuAD-ca <sup>1</sup> (F1/EM) |
| ------------------------|:-------|:-------|:-------------|:-----------|:----------|:----------------|:------------------------------|
| RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 89.50/76.63 | 73.64/55.42 |
| DistilRoBERTa-base-ca | 87.88 | 98.83 | 77.26 | 73.20 | 76.00 | 84.07/70.77 | 62.93/45.08 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca. |