|
--- |
|
license: mit |
|
datasets: |
|
- skg/toxigen-data |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Card for ToxiGen-ConPrompt |
|
|
|
**ToxiGen-ConPrompt** is a pre-trained language model for implicit hate speech detection. |
|
The model is pre-trained on a machine-generated dataset for implicit hate speech detection (i.e., *ToxiGen*) using our proposing pre-training approach (i.e., *ConPrompt*). |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
<!-- {{ model_summary | default("", true) }} --> |
|
|
|
## Model Details |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Base Model:** BERT-base-uncased |
|
- **Pre-training Source:** ToxiGen (https://aclanthology.org/2022.acl-long.234/) |
|
- **Pre-training Approach:** ConPrompt |
|
<!-- Provide the basic links for the model. --> |
|
- **Paper:** https://aclanthology.org/2023.findings-emnlp.731/ |
|
- **Repository:** https://github.com/youngwook06/ConPrompt |
|
|
|
|
|
## Ethical Considerations |
|
### Privacy Issue |
|
Before pre-training, we found out that some private information such as URLs exists in the machine-generated statements in ToxiGen. |
|
We anonymize such private information before pre-training to prevent any harm to our society. |
|
You can refer to the anonymization code we used in preprocess_toxigen.ipynb and we strongly emphasize to anonymize private information before using machine-generated data for pre-training. |
|
|
|
### Potential Misuse |
|
The pre-training source of ToxiGen-ConPrompt includes toxic statements. |
|
While we use such toxic statements on purpose to pre-train a better model for implicit hate speech detection, the pre-trained model needs careful handling. |
|
Here, we states some behavior that can lead to potential misuse so that our model is used for the social good rather than misued unintentionally or maliciously. |
|
|
|
- As our model was trained with the MLM objective, our model might generate toxic statements with its MLM head |
|
- As our model learned representations regarding implicit hate speeches, our model might retrieve some similar toxic statements given a toxic statement. |
|
|
|
While these behavior can lead to social good e.g., constructing training data for hate speech classifiers, one can potentially misuse the behaviors. |
|
|
|
**We strongly emphasize the need for careful handling to prevent unintentional misuse and warn against malicious exploitation of such behaviors.** |
|
|
|
|
|
## Acknowledgements |
|
- We use the [ToxiGen](https://huggingface.co/datasets/skg/toxigen-data) dataset as a pre-training source to pre-train our model. You can refer to the paper [here](https://aclanthology.org/2022.acl-long.234/). |
|
- We anonymize private information in the pre-training source following the code from https://github.com/dhfbk/hate-speech-artifacts. |
|
- Our pre-training code is based on the code from https://github.com/princeton-nlp/SimCSE with some modifications. |
|
- We use the code from https://github.com/youngwook06/ImpCon to fine-tune and evaluate our model. |
|
|
|
|
|
|