|
--- |
|
license: openrail++ |
|
datasets: |
|
- textdetox/multilingual_toxicity_dataset |
|
language: |
|
- en |
|
- ru |
|
- uk |
|
- es |
|
- de |
|
- am |
|
- ar |
|
- zh |
|
- hi |
|
metrics: |
|
- f1 |
|
--- |
|
This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our compiled dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset). |
|
|
|
Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following: |
|
|
|
| | Precision | Recall | F1 | |
|
|----------|-----------|--------|-------| |
|
| all_lang | 0.8713 | 0.8710 | 0.8710| |
|
| en | 0.9650 | 0.9650 | 0.9650| |
|
| ru | 0.9791 | 0.9790 | 0.9790| |
|
| uk | 0.9267 | 0.9250 | 0.9251| |
|
| de | 0.8791 | 0.8760 | 0.8758| |
|
| es | 0.8700 | 0.8700 | 0.8700| |
|
| ar | 0.7787 | 0.7780 | 0.7780| |
|
| am | 0.7781 | 0.7780 | 0.7780| |
|
| hi | 0.9360 | 0.9360 | 0.9360| |
|
| zh | 0.7318 | 0.7320 | 0.7315| |