Avijit Ghosh commited on
Commit
e089bfc
·
1 Parent(s): b0965ee
Files changed (1) hide show
  1. configs/crowspairs.yaml +8 -4
configs/crowspairs.yaml CHANGED
@@ -1,5 +1,8 @@
1
  Abstract: "Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereotyping and another that is less stereotyping. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress."
2
- Applicable Models: .nan
 
 
 
3
  Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman
4
  Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
5
  difficult. It also raises many false positives and can flag relatively neutral associations
@@ -17,6 +20,7 @@ Suggested Evaluation: Crow-S Pairs
17
  Level: Dataset
18
  URL: https://arxiv.org/abs/2010.00133
19
  What it is evaluating: Protected class stereotypes
20
- Metrics: .nan
21
- Affiliations: .nan
22
- Methodology: .nan
 
 
1
  Abstract: "Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereotyping and another that is less stereotyping. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress."
2
+ Applicable Models:
3
+ - BERT-base (Opensource access)
4
+ - RoBERTa-large (Opensource access)
5
+ - ALBERT-xxlv2 (Opensource access)
6
  Authors: Nikita Nangia, Clara Vania, Rasika Bhalerao, Samuel R. Bowman
7
  Considerations: Automating stereotype detection makes distinguishing harmful stereotypes
8
  difficult. It also raises many false positives and can flag relatively neutral associations
 
20
  Level: Dataset
21
  URL: https://arxiv.org/abs/2010.00133
22
  What it is evaluating: Protected class stereotypes
23
+ Metrics:
24
+ - Pseudo Log-Likelihood Masked LM Scoring
25
+ Affiliations: New York University
26
+ Methodology: Pairs of sentences with different stereotypical names and gender markers are presented to the model. The model is tasked with predicting the masked token in the sentence. This task is repeated for each token masked in the sentence, and the log-likehoods are accumulated in a sum.