Toxic language classification model of Bulgarian language, based on the bert-base-bg model.
The model classifies between 4 classes: Toxic, MedicalTerminology, NonToxic, MinorityGroup.
Classification report:
Accuracy | Precision | Recall | F1 Score | Loss Function |
---|---|---|---|---|
0.85 | 0.86 | 0.85 | 0.85 | 0.43 |
More information in the paper.
Code and usage
For training files and information how to use the model, refer to the GitHub repository of the project.
Reference
If you use the pipeline in your academic project, please cite as:
@article
{berbatova2025detecting,
title={Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text},
author={Berbatova, Melania and Vasev, Tsvetoslav},
year={2025}
}
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for sofia-uni/toxic-bert-bg
Base model
rmihaylov/bert-base-bg