File size: 9,815 Bytes
d5b763c 3218cda d5b763c 3218cda d5b763c 9e0bb9b 3d37ad9 d5b763c 301774e 3d37ad9 301774e 3d37ad9 d5b763c 301774e f70b79b 301774e fc3b31b 3218cda 3d37ad9 d5b763c c26025b d5b763c e53d525 c26025b 27baf95 e6beff1 d5b763c 4dd76dc 40ce988 4dd76dc d5b763c 69be3c0 301774e 5eff094 4dd76dc d60c023 5eff094 d5b763c 417c264 e06ebc7 417c264 e06ebc7 24b78a4 e06ebc7 24b78a4 e06ebc7 24b78a4 e06ebc7 417c264 137825a 016baab c26025b 17d420d c26025b 51f2b30 17d420d 27753bf 17d420d 417c264 d5b763c e06ebc7 24b78a4 e06ebc7 24b78a4 e06ebc7 24b78a4 e06ebc7 24b78a4 e06ebc7 d5b763c c26025b 137825a c26025b b67a614 27baf95 d5b763c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
---
language:
- da
- 'no'
- nb
- sv
license: apache-2.0
datasets:
- strombergnlp/danfever
- KBLab/overlim
- MoritzLaurer/multilingual-NLI-26lang-2mil7
pipeline_tag: zero-shot-classification
widget:
- example_title: Danish
text: Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder
dig'
candidate_labels: sundhed, politik, sport, religion
- example_title: Norwegian
text: Regjeringen i Russland hevder Norge fører en politikk som vil føre til opptrapping
i Arktis og «den endelige ødeleggelsen av russisk-norske relasjoner».
candidate_labels: helse, politikk, sport, religion
- example_title: Swedish
text: Så luras kroppens immunförsvar att bota cancer
candidate_labels: hälsa, politik, sport, religion
inference:
parameters:
hypothesis_template: Dette eksempel handler om {}
base_model: NbAiLab/nb-bert-large
---
# ScandiNLI - Natural Language Inference model for Scandinavian Languages
This model is a fine-tuned version of [NbAiLab/nb-bert-large](https://huggingface.co/NbAiLab/nb-bert-large) for Natural Language Inference in Danish, Norwegian Bokmål and Swedish.
We have released three models for Scandinavian NLI, of different sizes:
- alexandrainst/scandi-nli-large (this)
- [alexandrainst/scandi-nli-base](https://huggingface.co/alexandrainst/scandi-nli-base)
- [alexandrainst/scandi-nli-small](https://huggingface.co/alexandrainst/scandi-nli-small)
A demo of the large model can be found in [this Hugging Face Space](https://huggingface.co/spaces/alexandrainst/zero-shot-classification) - check it out!
The performance and model size of each of them can be found in the Performance section below.
## Quick start
You can use this model in your scripts as follows:
```python
>>> from transformers import pipeline
>>> classifier = pipeline(
... "zero-shot-classification",
... model="alexandrainst/scandi-nli-large",
... )
>>> classifier(
... "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
... candidate_labels=['sundhed', 'politik', 'sport', 'religion'],
... hypothesis_template="Dette eksempel handler om {}",
... )
{'sequence': "Mexicansk bokser advarer Messi - 'Du skal bede til gud, om at jeg ikke finder dig'",
'labels': ['sport', 'religion', 'politik', 'sundhed'],
'scores': [0.6134647727012634,
0.30309760570526123,
0.05021871626377106,
0.03321893885731697]}
```
## Performance
We assess the models both on their aggregate Scandinavian performance, as well as their language-specific Danish, Swedish and Norwegian Bokmål performance.
In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
### Scandinavian Evaluation
The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
| :-------- | :------------ | :--------- | :----------- | :----------- |
| `alexandrainst/scandi-nli-large` (this) | **73.70%** | **74.44%** | **83.91%** | 354M |
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 69.01% | 71.99% | 80.66% | 279M |
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | 67.42% | 71.54% | 80.09% | 178M |
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | 64.17% | 70.80% | 77.29% | 560M |
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 63.94% | 70.41% | 77.23% | 279M |
| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | 61.71% | 68.36% | 76.08% | 178M |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 56.02% | 65.30% | 73.56% | **22M** |
### Danish Evaluation
We use a test split of the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439) to evaluate the Danish performance of the models.
The test split is generated using [this gist](https://gist.github.com/saattrupdan/1cb8379232fdec6e943dc84595a85e7c).
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
| :-------- | :------------ | :--------- | :----------- | :----------- |
| `alexandrainst/scandi-nli-large` (this) | **73.80%** | **58.41%** | **86.98%** | 354M |
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 68.37% | 57.10% | 83.25% | 279M |
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | 62.44% | 55.00% | 80.42% | 178M |
| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | 56.92% | 53.25% | 76.39% | 178M |
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 52.79% | 52.00% | 72.35% | 279M |
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | 49.18% | 50.31% | 69.73% | 560M |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 47.28% | 48.88% | 73.46% | **22M** |
### Swedish Evaluation
We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Swedish performance of the models.
We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
| :-------- | :------------ | :--------- | :----------- | :----------- |
| `alexandrainst/scandi-nli-large` (this) | **76.69%** | **84.47%** | **84.38%** | 354M |
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | 75.35% | 83.42% | 83.55% | 560M |
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 73.84% | 82.46% | 82.58% | 279M |
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 73.32% | 82.15% | 82.08% | 279M |
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | 72.29% | 81.37% | 81.51% | 178M |
| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | 64.69% | 76.40% | 76.47% | 178M |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 62.35% | 74.79% | 74.93% | **22M** |
### Norwegian Evaluation
We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Norwegian performance of the models.
We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.
| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
| :-------- | :------------ | :--------- | :----------- | :----------- |
| `alexandrainst/scandi-nli-large` (this) | **70.61%** | **80.43%** | **80.36%** | 354M |
| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | 67.99% | 78.68% | 78.60% | 560M |
| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | 67.53% | 78.24% | 78.33% | 178M |
| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 65.33% | 76.73% | 76.65% | 279M |
| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 65.18% | 76.76% | 76.77% | 279M |
| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | 63.51% | 75.42% | 75.39% | 178M |
| [`alexandrainst/scandi-nli-small`](https://huggingface.co/alexandrainst/scandi-nli-small) | 58.42% | 72.22% | 72.30% | **22M** |
## Training procedure
It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.
The training split of DanFEVER is generated using [this gist](https://gist.github.com/saattrupdan/1cb8379232fdec6e943dc84595a85e7c).
The three languages are sampled equally during training, and they're validated on validation splits of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) and machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) for Swedish and Norwegian Bokmål, sampled equally.
Check out the [Github repository](https://github.com/alexandrainst/ScandiNLI) for the code used to train the ScandiNLI models, and the full training logs can be found in [this Weights and Biases report](https://wandb.ai/saattrupdan/huggingface/reports/ScandiNLI--VmlldzozMDQyOTk1?accessToken=r9crgxqvvigy2hatdjeobzwipz7f3id5vqg8ooksljhfw6wl0hv1b05asypsfj9v).
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 4242
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- max_steps: 50,000 |