Text Classification
Transformers
PyTorch
English
deberta
hate-speech-detection
Inference Endpoints
HannahRoseKirk commited on
Commit
1e97121
·
1 Parent(s): 1355716

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -6,7 +6,7 @@ license: cc-by-4.0
6
 
7
  ## Model description
8
 
9
- This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. Each round of data has emoji-containing statements which are either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
10
  - **Data Repository:** https://github.com/HannahKirk/Hatemoji
11
  - **Paper:** https://arxiv.org/abs/2108.05921
12
  - **Point of Contact:** hannah.kirk@oii.ox.ac.uk
@@ -49,10 +49,17 @@ python3 transformers/examples/pytorch/text-classification/run_glue.py \
49
  ```
50
 
51
  We experimented with upsampling the train split of each round to improve performance with increments of [1, 5, 10, 100], with the optimum upsampling taken
52
- forward to all subsequent rounds. The optimal upsampling ratios for R1-R4 (text rounds from Vidgen et al.,) are carried forward. This model is trained on upsampling ratios of `{'R0': 1, 'R1':, 'R2':, 'R3':, 'R4': , 'R5':, 'R6':, 'R7':}.
53
 
54
  ## Variable and metrics
 
 
 
 
 
55
 
56
- ## Evaluation results
57
 
 
 
58
 
 
6
 
7
  ## Model description
8
 
9
+ This model is a fine-tuned version of the [DeBERTa base model](https://huggingface.co/microsoft/deberta-base). This model is cased. The model was trained on iterative rounds of adversarial data generation with human-and-model-in-the-loop. In each round, annotators are tasked with tricking the model-in-the-loop with emoji-containing statements that it will misclassify. Between each round, the model is retrained. This is the final model from the iterative process, referred to as R8-T in our paper. The intended task is to classify an emoji-containing statement as either non-hateful (LABEL 0.0) or hateful (LABEL 1.0).
10
  - **Data Repository:** https://github.com/HannahKirk/Hatemoji
11
  - **Paper:** https://arxiv.org/abs/2108.05921
12
  - **Point of Contact:** hannah.kirk@oii.ox.ac.uk
 
49
  ```
50
 
51
  We experimented with upsampling the train split of each round to improve performance with increments of [1, 5, 10, 100], with the optimum upsampling taken
52
+ forward to all subsequent rounds. The optimal upsampling ratios for R1-R4 (text rounds from Vidgen et al.,) are carried forward. This model is trained on upsampling ratios of `{'R0':1, 'R1':5, 'R2':100, 'R3':1, 'R4':1 , 'R5':100, 'R6':1, 'R7':5}.
53
 
54
  ## Variable and metrics
55
+ We evaluate the model based on:
56
+ * [HatemojiCheck](https://huggingface.co/datasets/HannahRoseKirk/HatemojiCheck), an evaluation checklist with 7 functionalities of emoji-based hate and contrast sets
57
+ * [HateCheck](https://huggingface.co/datasets/Paul/hatecheck), an evaluation checklist contains 29 functional tests for hate speech and contrast sets.
58
+ * The held-out tests sets from the three round of adversarially-generated data collection with emoji-containing examples (R5-7).
59
+ * The held-out test sets from the four rounds of adversarially-generated data collection with text-only examples (R1-4, from Vidgen et al.)
60
 
61
+ For the round-specific test sets, we used a weighted F1-score across them to choose the final model in each round. For more details, see our [paper](https://arxiv.org/abs/2108.05921)
62
 
63
+ ## Evaluation results
64
+ For full evaluation of the model, see our [paper](https://arxiv.org/abs/2108.05921).
65