Model Card for German Hate Speech Classifier

Model Details

Introduction

This model was developed to explore the potential of German language models in multi-class classification of hate speech in German online journals. It is a fine-tuned version of the GBERT model from (Chan, Schweter, and Möller, 2020).

Dataset

The dataset used for training is a consolidation of three pre-existing German hate speech datasets:

RP (Assenmacher et al., 2021)
DeTox (Demus et al., 2022)
Twitter dataset (Glasenbach, 2022)

The combined dataset underwent cleaning to minimize biases and remove redundant data.

Performance

Our experiments delivered promising results, with the model reliably classifying comments into:

No Hate Speech
Other Hate Speech (Threat, Insult, Profanity)
Political Hate Speech
Racist Hate Speech
Sexist Hate Speech

The model achieved a macro F1-score of 0.775. However, to further reduce misclassifications, improvements are essential. Short comments are overproportionally classified as Sexist Hate Speech.