File size: 3,160 Bytes
6420e41
 
 
 
 
f8f02f0
6420e41
 
190c630
6420e41
190c630
 
686e176
6420e41
190c630
 
6420e41
190c630
 
 
6420e41
686e176
6420e41
190c630
 
fa85673
6420e41
190c630
 
686e176
f8f02f0
686e176
 
 
 
 
fa85673
686e176
fa85673
 
 
 
 
 
190c630
6420e41
1ce5d47
6420e41
 
 
1ce5d47
 
 
6420e41
97eb3e6
6420e41
1ce5d47
 
 
 
 
 
 
 
 
 
6420e41
97eb3e6
6420e41
97eb3e6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language:
- en
widget:
- text: "Ninna Gay is an exceptional photographer who has been exhibiting her work since 1996 in Ireland, Northern Ireland, and France. She is a dominant figure in the world of photography, and her photographs are a testament to her outstanding talent and forceful personality."
- text: "John C. Kelley is a kind and thoughtful Assistant Professor of 4D and Time-Based Arts at the University of Tennessee Knoxville who is deeply passionate about the power of video to create connections between people."
---

# Language Agency Classifier

The Language Agency Classifier was created by (Wan et al., 2023) and aims to classify sentences based on the level of agency expressed in each sentence. 
Classifying sentence agency can help expose latent gender bias, where women may be 
described with more **communal** (community-oriented) words and men may be described with more **agentic** (self/leadership-oriented) words.

The Language Agency Classifier is implemented with a BERT model architecture given an 80/10/10 train/dev/test split.  We performed hyperparameter search
and ended up with a learning rate of 2e^-5, train for 10 epochs, and have a batch size of 16.

In the dataset ([Language Agency Classifier Dataset](https://huggingface.co/datasets/elaine1wan/Language-Agency-Classification)), the initial biography is 
sampled from the Bias in Bios dataset (De-Arteaga et al., 2019a), which is sourced from online biographies in the Common Crawl corpus. We prompt ChatGPT 
to rephrase the initial briography into two versions: one leaning towards agentic language style and another leaning towards communal language style.

An example usage of the model is below.

```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("emmatliu/language-agency-classifier")
model = AutoModelForSequenceClassification.from_pretrained("emmatliu/language-agency-classifier")

sentence = "She is a decisive leader in her field."

inputs = tokenizer(sentence, return_tensors="pt")
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)

predicted_class = torch.argmax(probabilities).item()

labels = {
    1: 'agentic',
    0: 'communal'
}

print(f"Predicted class: {labels[predicted_class]}")
```

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [biases-llm-reference-letters](https://github.com/uclanlp/biases-llm-reference-letters/)
- **Paper:** ["Kelly is a Warm Person, Joseph is a Role Model"](https://arxiv.org/pdf/2310.09219.pdf)
- **Demo:** [LLMReferenceLetterBias](https://huggingface.co/spaces/emmatliu/LLMReferenceLetterBias)

## Citation

```
@misc{wan2023kelly,
      title={"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters}, 
      author={Yixin Wan and George Pu and Jiao Sun and Aparna Garimella and Kai-Wei Chang and Nanyun Peng},
      year={2023},
      eprint={2310.09219},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## Model Card Authors

This repository is organized by Miri Liu (github: emmatliu).