HannahRoseKirk
/

Hatemoji

Text Classification

hate-speech-detection

Inference Endpoints

Model card Files Files and versions Community

HannahRoseKirk commited on Apr 20, 2022

Commit

3c511aa

·

1 Parent(s): 7a291d7

Update README.md

Files changed (1) hide show

README.md +17 -0

README.md CHANGED Viewed

@@ -1,3 +1,20 @@
 ---
 license: cc-by-4.0
 ---

 ---
 license: cc-by-4.0
 ---
+# Hatemoji Model
+## Model description
+This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. Each round of data has emoji-containing statements which are either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
+- **Data Repository:** https://github.com/HannahKirk/Hatemoji
+- **Paper:** https://arxiv.org/abs/2108.05921
+- **Point of Contact:** hannah.kirk@oii.ox.ac.uk
+## Intended uses & limitations
+The intended use of the model is to classify English-language, emoji-containing, short-form text documents as a binary task: non-hateful vs hateful. The model has demonstrated strengths compared to commercial and academic models on classifying emoji-based hate, but is also a strong classifier of text-only hate. Because the model was trained on synthetic, adversarially-generated data, it may have some weaknesses when it comes to empirical emoji-based hate 'in-the-wild'.
+## How to use
+## Training data
+The model was trained on [HatemojiBuild](https://huggingface.co/datasets/HannahRoseKirk/HatemojiBuild), alongside the four rounds of text-only adversarial data from Vidgen, B., Thrush, T., Waseem, Z., & Kiela, D. (2020). Learning from the worst: Dynamically generated datasets to improve online hate detection. arXiv