Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,114 @@
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
tags:
|
7 |
+
- counter speech
|
8 |
---
|
9 |
+
|
10 |
+
---
|
11 |
+
|
12 |
+
# Target-Aware Counter-Speech Generation
|
13 |
+
|
14 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
15 |
+
|
16 |
+
The target-aware counter-speech generation model is an autoregressive generative language model fine-tuned on hate- and counter-speech pairs from the [CONAN](https://github.com/marcoguerini/CONAN) datasets for generating more contextually relevant counter-speech, based on the [gpt2-medium](https://huggingface.co/gpt2-medium) model.
|
17 |
+
The model utilizes special tokens that embedded target demographic information to guide the generation towards more relevant responses, avoiding off-topic and generic responses. The model is trained on 8 target demographics, including Migrants, People of Color (POC), LGBT+, Muslims, Women, Jews, Disabled, and Other.
|
18 |
+
|
19 |
+
## Uses
|
20 |
+
|
21 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
22 |
+
The model is intended for generating counter-speech responses for a given hate speech sequence, combined with special tokens for target-demographic embeddings.
|
23 |
+
|
24 |
+
|
25 |
+
## Bias, Risks, and Limitations
|
26 |
+
|
27 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
28 |
+
|
29 |
+
We observed negative effects such as content hallucination and toxic response generation. Though the intended use is to generate counter-speech for combating online hatred, the usage is to be monitored carefully with human post-editing or approval system, ensuring safe and inclusive online environment.
|
30 |
+
|
31 |
+
|
32 |
+
## How to Get Started with the Model
|
33 |
+
|
34 |
+
Use the code below to get started with the model.
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
types = ["MIGRANTS", "POC", "LGBT+", "MUSLIMS", "WOMEN", "JEWS", "other", "DISABLED"] # A list of all available target-demographic tokens
|
39 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
40 |
+
|
41 |
+
model = AutoModelForCausalLM.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
|
42 |
+
tokenizer = AutoTokenizer.from_pretrained(tum-nlp/gpt-2-medium-target-aware-counterspeech-generation)
|
43 |
+
tokenizer.padding_side = "left"
|
44 |
+
|
45 |
+
prompt = "<|endoftext|> <other> Hate-speech: Human are not created equal, some are born lesser. Counter-speech: "
|
46 |
+
input = tokenizer(prompt, return_tensors="pt", padding=True)
|
47 |
+
output_sequences = model.generate(
|
48 |
+
input_ids=inputs['input_ids'].to(model.device),
|
49 |
+
attention_mask=inputs['attention_mask'].to(model.device),
|
50 |
+
pad_token_id=tokenizer.eos_token_id,
|
51 |
+
max_length=128,
|
52 |
+
num_beams=3,
|
53 |
+
no_repeat_ngram_size=3,
|
54 |
+
num_return_sequences=1,
|
55 |
+
early_stopping=True
|
56 |
+
)
|
57 |
+
result = tokenizer.decode(output_sequences, skip_special_tokens=True)
|
58 |
+
|
59 |
+
|
60 |
+
#### Training Hyperparameters
|
61 |
+
|
62 |
+
training_args = TrainingArguments(
|
63 |
+
num_train_epochs=20,
|
64 |
+
learning_rate=3.800568576836524e-05,
|
65 |
+
weight_decay=0.050977894796868116,
|
66 |
+
warmup_ratio=0.10816909354342182,
|
67 |
+
optim="adamw_torch",
|
68 |
+
lr_scheduler_type="cosine",
|
69 |
+
evaluation_strategy="epoch",
|
70 |
+
save_strategy="epoch",
|
71 |
+
save_total_limit=3,
|
72 |
+
load_best_model_at_end=True,
|
73 |
+
auto_find_batch_size=True,
|
74 |
+
)
|
75 |
+
|
76 |
+
|
77 |
+
## Evaluation
|
78 |
+
|
79 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
80 |
+
|
81 |
+
### Testing Data, Factors & Metrics
|
82 |
+
|
83 |
+
#### Testing Data
|
84 |
+
|
85 |
+
<!-- This should link to a Data Card if possible. -->
|
86 |
+
|
87 |
+
The model's performance is tested on three test sets, from which two are subsets of the [CONAN](https://github.com/marcoguerini/CONAN) dataset and one is the sexist portion of the [EDOS](https://github.com/rewire-online/edos) dataset
|
88 |
+
|
89 |
+
#### Metrics
|
90 |
+
|
91 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
92 |
+
|
93 |
+
The model's performance is tested on a custom evaluation pipeline for counter-speech generation. The pipeline includes CoLA, Toxicity, Hatefulness, Offensiveness, Label and Context Similarity, Validity as Counter-Speech, Repetition Rate, target-demographic F1 and the Arithmetic Mean
|
94 |
+
|
95 |
+
|
96 |
+
### Results
|
97 |
+
CONAN
|
98 |
+
| Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
|
99 |
+
| ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
|
100 |
+
| Human | 0.937 | 0.955 | 1.000 | 0.997 | - | 0.751 | 0.980 | 0.861 | 0.885 | 0.929 |
|
101 |
+
| target-aware gpt2-medium | 0.958 | 0.946 | 1.000 | 0.996 | 0.706 | 0.784 | 0.946 | 0.419 | 0.880 | 0.848 |
|
102 |
+
|
103 |
+
CONAN SMALL
|
104 |
+
| Model Name | CoLA |TOX | Hate | OFF | L Sim | C Sim | VaCS | RR | F1 | AM |
|
105 |
+
| ---------- | ---- | -- | ---- | --- | ----- | ----- | ---- | -- | -- | -- |
|
106 |
+
| Human | 0.963 | 0.956 | 1.000 | 1.000 | 1.000 | 0.768 | 0.988 | 0.995 | 0.868 | 0.949 |
|
107 |
+
| target-aware gpt2-medium | 0.975 | 0.931 | 1.000 | 1.000 | 0.728 | 0.783 | 0.888 | 0.911 | 0.792 | 0.890 |
|
108 |
+
|
109 |
+
EDOS
|
110 |
+
| Model Name | CoLA |TOX | Hate | OFF | C Sim | VaCS | RR | F1 | AM |
|
111 |
+
| ---------- | ---- | -- | ---- | --- | ----- | ---- | -- | -- | -- |
|
112 |
+
| target-aware gpt2-medium | 0.930 | 0.815 | 0.999 | 0.975 | 0.689 | 0.857 | 0.518 | 0.747 | 0.816|
|
113 |
+
|
114 |
+
|