apanc
/

russian-inappropriate-messages

Text Classification

toxic comments classification

Inference Endpoints

Model card Files Files and versions Community

NiGuLa commited on Mar 19, 2021

Commit

b52f3a9

•

1 Parent(s): 9fcc197

Create README.md

Files changed (1) hide show

README.md +48 -0

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+---
+language:
+- ru
+tags:
+- toxic comments classification
+licenses:
+- cc-by-nc-sa
+---
+## General concept of the model
+This model is trained on the dataset of inappropriate messages of the Russian language. The concept of inappropriateness is described [in this article ](https://arxiv.org/abs/2103.05345) presented at the workshop for Balto-Slavic NLP at the EACL-2021 conference. Please note that this article describes the first version of the dataset, while the model is trained on the extended version of the dataset open-sourced on our [GitHub](https://github.com/skoltech-nlp/inappropriate-sensitive-topics/blob/main/Version2/appropriateness/Appropriateness.csv). The properties of the dataset is the same as the one described in the article, the only difference is the size.
+The model was trained, validated and tested only on the samples with 100% confidence, which allowed to get the following metrics on test set:
+|              | precision | recall | f1-score | support |
+|--------------|----------|--------|----------|---------|
+| 0            | 0.92     | 0.93   | 0.93     | 7839    |
+| 1            | 0.80     | 0.76   | 0.78     | 2726    |
+| accuracy     |          |        | 0.89     | 10565   |
+| macro avg    | 0.86     | 0.85   | 0.85     | 10565   |
+| weighted avg | 0.89     | 0.89   | 0.89     | 10565   |
+## Licensing Information
+[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
+[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
+[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
+[cc-by-nc-sa-image]: https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png
+## Citation
+If you find this repository helpful, feel free to cite our publication:
+```
+@inproceedings{babakov-etal-2021-bsnlp,
+    title = "Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation",
+    author = "Babakov, Nikolay and Logacheva, Varvara and Kozlova, Olga and Semenov, Nikita and Panchenko, Alexander",
+    booktitle = "To appear in the Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
+    month = April,
+    year = "2021",
+    address = "Kyiv, Ukraine"
+}
+```