HannahRoseKirk
/

Hatemoji

@@ -6,7 +6,7 @@ license: cc-by-4.0
 ## Model description
-This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. Each round of data has emoji-containing statements which are either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
 - **Data Repository:** https://github.com/HannahKirk/Hatemoji
 - **Paper:** https://arxiv.org/abs/2108.05921
 - **Point of Contact:** hannah.kirk@oii.ox.ac.uk
@@ -49,10 +49,17 @@ python3 transformers/examples/pytorch/text-classification/run_glue.py \
 ```
 We experimented with upsampling the train split of each round to improve performance with increments of [1, 5, 10, 100], with the optimum upsampling taken
-forward to all subsequent rounds. The optimal upsampling ratios for R1-R4 (text rounds from Vidgen et al.,) are carried forward. This model is trained on upsampling ratios of `{'R0': 1, 'R1':, 'R2':, 'R3':, 'R4': , 'R5':, 'R6':, 'R7':}.
 ## Variable and metrics
-## Evaluation results

 ## Model description
+This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. In each round, annotators are tasked with tricking the model-in-the-loop with emoji-containing statements that it will misclassify. Between each round, the model is retrained. This is the final model from the iterative process, referred to as R8-T in our paper. The intended task is to classify an emoji-containing statement as either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
 - **Data Repository:** https://github.com/HannahKirk/Hatemoji
 - **Paper:** https://arxiv.org/abs/2108.05921
 - **Point of Contact:** hannah.kirk@oii.ox.ac.uk
 ```
 We experimented with upsampling the train split of each round to improve performance with increments of [1, 5, 10, 100], with the optimum upsampling taken
+forward to all subsequent rounds. The optimal upsampling ratios for R1-R4 (text rounds from Vidgen et al.,) are carried forward. This model is trained on upsampling ratios of `{'R0':1, 'R1':5, 'R2':100, 'R3':1, 'R4':1 , 'R5':100, 'R6':1, 'R7':5}.
 ## Variable and metrics
+We evaluate the model based on:
+* [HatemojiCheck](https://huggingface.co/datasets/HannahRoseKirk/HatemojiCheck), an evaluation checklist with 7 functionalities of emoji-based hate and contrast sets
+* [HateCheck](https://huggingface.co/datasets/Paul/hatecheck), an evaluation checklist contains 29 functional tests for hate speech and contrast sets.
+* The held-out tests sets from the three round of adversarially-generated data collection with emoji-containing examples (R5-7).
+* The held-out test sets from the four rounds of adversarially-generated data collection with text-only examples (R1-4, from Vidgen et al.)
+For the round-specific test sets, we used a weighted F1-score across them to choose the final model in each round. For more details, see our [paper](https://arxiv.org/abs/2108.05921)
+## Evaluation results
+For full evaluation of the model, see our [paper](https://arxiv.org/abs/2108.05921).