textdetox
/

xlmr-large-toxicity-classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xlmr-large-toxicity-classifier / README.md

dardem's picture

Update README.md

e0a9096 verified 11 months ago

|

1.13 kB

	---
	license: openrail++
	datasets:
	- textdetox/multilingual_toxicity_dataset
	language:
	- en
	- ru
	- uk
	- es
	- de
	- am
	- ar
	- zh
	- hi
	metrics:
	- f1
	---
	This is an instance of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) that was fine-tuned on binary toxicity classification task based on our compiled dataset [textdetox/multilingual_toxicity_dataset](https://huggingface.co/datasets/textdetox/multilingual_toxicity_dataset).

	Firstly, we separated a balanced 20% test set to check the model adequency. Then, the model was fine-tuned on the full data. The results on the test set are the following:

	\| \| Precision \| Recall \| F1 \|
	\|----------\|-----------\|--------\|-------\|
	\| all_lang \| 0.8713 \| 0.8710 \| 0.8710\|
	\| en \| 0.9650 \| 0.9650 \| 0.9650\|
	\| ru \| 0.9791 \| 0.9790 \| 0.9790\|
	\| uk \| 0.9267 \| 0.9250 \| 0.9251\|
	\| de \| 0.8791 \| 0.8760 \| 0.8758\|
	\| es \| 0.8700 \| 0.8700 \| 0.8700\|
	\| ar \| 0.7787 \| 0.7780 \| 0.7780\|
	\| am \| 0.7781 \| 0.7780 \| 0.7780\|
	\| hi \| 0.9360 \| 0.9360 \| 0.9360\|
	\| zh \| 0.7318 \| 0.7320 \| 0.7315\|