Spaces:
Sleeping
Sleeping
Abstract: "Artificial intelligence and machine learning are currently undergoing rapid growth. Concerns persist regarding their potential to perpetuate biases inherent in human language. This study demonstrates that standard machine learning applied to everyday language reproduces a range of human biases, from implicit associations to societal norms. Using the GloVe word embedding model trained on web text, the research reveals that language itself contains historical biases, whether neutral or problematic. New evaluation methods, WEAT and WEFAT, are introduced. These findings have broad implications for AI, psychology, sociology, and ethics, suggesting that biases may stem from everyday linguistic exposure." | |
Applicable Models: | |
- GloVe (Opensource access) | |
Authors: Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan | |
Considerations: Although based in human associations, general societal attitudes do | |
not always represent subgroups of people and cultures. | |
Datasets: .nan | |
Group: BiasEvals | |
Hashtags: | |
- Bias | |
- Word Association | |
- Embeddings | |
- NLP | |
Link: Semantics derived automatically from language corpora contain human-like biases | |
Modality: Text | |
Screenshots: | |
- Images/WEAT1.png | |
- Images/WEAT2.png | |
Suggested Evaluation: Word Embedding Association Test (WEAT) | |
Level: Model | |
URL: https://researchportal.bath.ac.uk/en/publications/semantics-derived-automatically-from-language-corpora-necessarily | |
What it is evaluating: Associations and word embeddings based on Implicit Associations | |
Test (IAT) | |
Metrics: | |
- Cosine Similarity | |
- Effect Size | |
Affiliations: Princeton University, University of Bath | |
Methodology: Effect sizes between two sets of target words (e.g., programmer, engineer, scientist, ... and nurse, teacher, librarian, ...) and two sets of attribute words (e.g., man, male, ... and woman, female ...) are calculated using cosine similarity of the embeddings, with the null hypothesis that an unbaised model would have no difference betwewen the sets. |