Fine-tuned model for detecting instances of abusive language in Ducth tweets. The model has been trained with DALC v2.0 .

Abusive language is defined as "Impolite, harsh, or hurtful language (that may contain profanities or vulgar language) that result in a debasement, harassment, threat, or aggression of an individual or a (social) group, but not necessarily of an entity, an institution, an organisations, or a concept." (Ruitenbeek et al., 2022)

The model achieve the following results on multiple test data:

  • DALC held-out test set: macro F1: 72.23; F1 Abusive: 51.60
  • HateCheck-NL (functional benchmark for hate speech): Accuracy: 60.19; Accuracy non-hateful tests: 57.38 ; Accuracy hateful tests: 59.58
  • OP-NL (dynamyc benchmark for offensive language): macro F1: 57.57

More details on the training settings and pre-processind are available here

Downloads last month
107
Safetensors
Model size
109M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.