distilbert-base-dutch-toxic-comments
Model description:
This model was created with the purpose to detect toxic or potentially harmful comments.
For this model, we finetuned a multilingual distilbert model distilbert-base-multilingual-cased on the translated Jigsaw Toxicity dataset.
The original dataset was translated using the appropriate MariantMT model.
The model was trained for 2 epochs, on 90% of the dataset, with the following arguments:
training_args = TrainingArguments(
learning_rate=3e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
gradient_accumulation_steps=4,
load_best_model_at_end=True,
metric_for_best_model="recall",
epochs=2,
evaluation_strategy="steps",
save_strategy="steps",
save_total_limit=10,
logging_steps=100,
eval_steps=250,
save_steps=250,
weight_decay=0.001,
report_to="wandb")
Model Performance:
Model evaluation was done on 1/10th of the dataset, which served as the test dataset.
Accuracy | F1 Score | Recall | Precision |
---|---|---|---|
95.75 | 78.88 | 77.23 | 80.61 |
Dataset:
Unfortunately we cannot open-source the dataset, since we are bound by the underlying Jigsaw license.
- Downloads last month
- 22
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.