Edit model card

Binary toxicity classifier for Ukrainian.

This is the fine-tuned on the downstream task "distilbert-base-multilingual-cased" instance.

The evaluation metrics for binary toxicity classification are:

Precision: 0.9310 Recall: 0.9300 F1: 0.9300

The training and evaluation data will be clarified later.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# load tokenizer and model weights
tokenizer = AutoTokenizer.from_pretrained('dardem/mdistilbert-base-cased-uk-toxicity')
model = AutoModelForSequenceClassification.from_pretrained('dardem/mdistilbert-base-cased-uk-toxicity')

# prepare the input
batch = tokenizer.encode('Ти неймовірна!', return_tensors='pt')

# inference
model(batch)

Citation

@article{dementieva2024toxicity,
  title={Toxicity Classification in Ukrainian},
  author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg},
  journal={arXiv preprint arXiv:2404.17841},
  year={2024}
}
Downloads last month
10
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dardem/mdistilbert-base-cased-uk-toxicity

Finetuned
(194)
this model

Dataset used to train dardem/mdistilbert-base-cased-uk-toxicity