Andrazp
/

multilingual-hate-speech-robacofi

Text Classification

Inference Endpoints

Model card Files Files and versions Community

multilingual-hate-speech-robacofi / README.md

Andrazp's picture

Upload 5 files

4e10fdb about 2 years ago

|

1.08 kB

	---
	widget:

	- text: "My name is Mark and I live in London. I am a postgraduate student at Queen Mary University."
	language:
	- en
	license: mit
	---

	# Multilingual Hate Speech Classifier for Social Media Content

	A multilingual model for hate speech classification of social media content. The model is based on pre-trained multilingual representations from the XLM-T model (https://arxiv.org/abs/2104.12250) and was jointly fine-tuned on five languages, namely Arabic, Croatian, English, German and Slovenian. The test results on these five languages in terms of F1 score are as follows:

	\| Language \| F1 \|
	\|-----------\|:------:\|
	\| Arabic \| 0.8704 \|
	\| Croatian \| 0.7226 \|
	\| English \| 0.7851 \|
	\| German \| 0.7826 \|
	\| Slovenian \| 0.7596 \|

	## Tokenizer

	During training the text was preprocessed using the original XLM-T tokenizer. The pretrained tokenizer files are included in this repository. We suggest the same tokenizer is used for inference.

	## Model output

	The model classifies each input into one of two distinct classes:
	* 0 - not-offensive
	* 1 - offensive