|
--- |
|
license: mit |
|
base_model: indobenchmark/indobert-lite-base-p1 |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
language: |
|
- ind |
|
datasets: |
|
- indonli |
|
- MoritzLaurer/multilingual-NLI-26lang-2mil7 |
|
- LazarusNLP/multilingual-NLI-26lang-2mil7-id |
|
widget: |
|
- text: Andi tersenyum karena mendapat hasil baik. </s></s> Andi sedih. |
|
model-index: |
|
- name: indobert-lite-base-p1-indonli-multilingual-nli-distil-mdeberta |
|
results: [] |
|
--- |
|
|
|
# IndoBERT Lite Base IndoNLI Multilingual NLI Distil mDeBERTa |
|
|
|
IndoBERT Lite Base IndoNLI Multilingual NLI Distil mDeBERTa is a natural language inference (NLI) model based on the [ALBERT](https://arxiv.org/abs/1909.11942) model. The model was originally the pre-trained [indobenchmark/indobert-lite-base-p1](https://huggingface.co/indobenchmark/indobert-lite-base-p1) model, which is then fine-tuned on [`IndoNLI`](https://github.com/ir-nlp-csui/indonli) and the [Indonesian subsets](https://huggingface.co/datasets/LazarusNLP/multilingual-NLI-26lang-2mil7-id) of [MoritzLaurer/multilingual-NLI-26lang-2mil7](https://huggingface.co/datasets/MoritzLaurer/multilingual-NLI-26lang-2mil7), whilst being distilled from [MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7). |
|
|
|
## Evaluation Results |
|
|
|
| | `dev` Acc. | `test_lay` Acc. | `test_expert` Acc. | |
|
| --------- | :--------: | :-------------: | :----------------: | |
|
| `IndoNLI` | 78.60 | 74.69 | 65.55 | |
|
|
|
## Model |
|
|
|
| Model | #params | Arch. | Training/Validation data (text) | |
|
| ---------------------------------------------------------------- | ------- | ----------- | ---------------------------------- | |
|
| `indobert-lite-base-p1-indonli-multilingual-nli-distil-mdeberta` | 11.7M | ALBERT Base | `IndoNLI`, Multilingual NLI (`id`) | |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- `learning_rate`: `2e-05` |
|
- `train_batch_size`: `64` |
|
- `eval_batch_size`: `64` |
|
- `seed`: `42` |
|
- `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08` |
|
- `lr_scheduler_type`: linear |
|
- `num_epochs`: `5` |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall | |
|
| :-----------: | :---: | :---: | :-------------: | :------: | :----: | :-------: | :----: | |
|
| 0.4808 | 1.0 | 1803 | 0.4418 | 0.7683 | 0.7593 | 0.7904 | 0.7554 | |
|
| 0.4529 | 2.0 | 3606 | 0.4343 | 0.7738 | 0.7648 | 0.7893 | 0.7619 | |
|
| 0.4263 | 3.0 | 5409 | 0.4383 | 0.7861 | 0.7828 | 0.7874 | 0.7807 | |
|
| 0.398 | 4.0 | 7212 | 0.4456 | 0.7792 | 0.7767 | 0.7792 | 0.7756 | |
|
| 0.3772 | 5.0 | 9015 | 0.4499 | 0.7711 | 0.7674 | 0.7700 | 0.7661 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.2 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |
|
|
|
## References |
|
|
|
[1] Mahendra, R., Aji, A. F., Louvan, S., Rahman, F., & Vania, C. (2021, November). [IndoNLI: A Natural Language Inference Dataset for Indonesian](https://arxiv.org/abs/2110.14566). _Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing_. Association for Computational Linguistics. |