|
--- |
|
license: mit |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
model-index: |
|
- name: PolicyBERTa-7d |
|
results: [] |
|
widget: |
|
- text: "Russia must end the war." |
|
- text: "Democratic institutions must be supported." |
|
- text: "The state must fight political corruption." |
|
- text: "Our energy economy must be nationalised." |
|
- text: "We must increase social spending." |
|
|
|
--- |
|
|
|
# PolicyBERTa-7d |
|
|
|
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on data from the [Manifesto Project](https://manifesto-project.wzb.eu/). It was inspired by the model from [Laurer (2020)](https://huggingface.co/MoritzLaurer/policy-distilbert-7d). |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.8549 |
|
- Accuracy: 0.7059 |
|
- F1-micro: 0.7059 |
|
- F1-macro: 0.6683 |
|
- F1-weighted: 0.7033 |
|
- Precision: 0.7059 |
|
- Recall: 0.7059 |
|
|
|
## Model description |
|
|
|
This model was trained on 115,943 manually annotated sentences to classify text into one of seven political categories: "external relations", "freedom and democracy", "political system", "economy", "welfare and quality of life", "fabric of society" and "social groups". |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
The model output reproduces the limitations of the dataset in terms of country coverage, time span, domain definitions and potential biases of the annotators - as any supervised machine learning model would. Applying the model to other types of data (other types of texts, countries etc.) will reduce performance. |
|
|
|
```python |
|
from transformers import pipeline |
|
import pandas as pd |
|
|
|
classifier = pipeline( |
|
task="text-classification", |
|
model="niksmer/PolicyBERTa-7d") |
|
|
|
# Load text data you want to classify |
|
text = pd.read_csv("text.csv") |
|
|
|
# Inference |
|
output = classifier(text) |
|
|
|
# Print output |
|
pd.DataFrame(output).head() |
|
``` |
|
|
|
## Training and evaluation data |
|
|
|
PolicyBERTa-7d was trained on the English-speaking subset of the [Manifesto Project Dataset (MPDS2020a)](https://manifesto-project.wzb.eu/datasets). The model was trained on 115,943 sentences from 163 political manifestos in 7 English-speaking countries (Australia, Canada, Ireland, New Zealand, South Africa, United Kingdom, United States). The manifestos were published between 1992 - 2020. |
|
|
|
| Country | Count manifestos | Count sentences | Time span | |
|
|----------------|------------------|-----------------|--------------------| |
|
| Australia | 18 | 14,887 | 2010-2016 | |
|
| Ireland | 23 | 24,966 | 2007-2016 | |
|
| Canada | 14 | 12,344 | 2004-2008 & 2015 | |
|
| New Zealand | 46 | 35,079 | 1993-2017 | |
|
| South Africa | 29 | 13,334 | 1994-2019 | |
|
| USA | 9 | 13,188 | 1992 & 2004-2020 | |
|
| United Kingdom | 34 | 30,936 | 1997-2019 | |
|
|
|
Canadian manifestos between 2004 and 2008 are used as test data. |
|
|
|
|
|
The Manifesto Project mannually annotates individual sentences from political party manifestos in 7 main political domains: 'Economy', 'External Relations', 'Fabric of Society', 'Freedom and Democracy', 'Political System', 'Welfare and Quality of Life' or 'Social Groups' - see the [codebook](https://manifesto-project.wzb.eu/down/papers/handbook_2021_version_5.pdf) for the exact definitions of each domain. |
|
|
|
### Tain data |
|
|
|
Train data was higly imbalanced. |
|
|
|
| Label | Description | Count | |
|
|------------|--------------|--------| |
|
| 0 | external relations | 7,640 | |
|
| 1 | freedom and democracy | 5,880 | |
|
| 2 | political system | 11,234 | |
|
| 3 | economy | 29,218 | |
|
| 4 | welfare and quality of life | 37,200 | |
|
| 5 | fabric of society | 13,594 | |
|
| 6 | social groups | 11,177 | |
|
|
|
Overall count: 115,943 |
|
|
|
### Validation data |
|
|
|
The validation was created by chance. |
|
|
|
| Label | Description | Count | |
|
|------------|--------------|--------| |
|
| 0 | external relations | 1,345 | |
|
| 1 | freedom and democracy | 1,043 | |
|
| 2 | political system | 2,038 | |
|
| 3 | economy | 5,140 | |
|
| 4 | welfare and quality of life | 6,554 | |
|
| 5 | fabric of society | 2,384 | |
|
| 6 | social groups | 1,957 | |
|
|
|
Overall count: 20,461 |
|
|
|
## Test data |
|
|
|
The test dataset contains ten canadian manifestos between 2004 and 2008. |
|
|
|
| Label | Description | Count | |
|
|------------|--------------|--------| |
|
| 0 | external relations | 824 | |
|
| 1 | freedom and democracy | 296 | |
|
| 2 | political system | 1,041 | |
|
| 3 | economy | 2,188 | |
|
| 4 | welfare and quality of life | 2,654 | |
|
| 5 | fabric of society | 940 | |
|
| 6 | social groups | 387 | |
|
|
|
Overall count: 8,330 |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
``` |
|
training_args = TrainingArguments( |
|
warmup_steps=0, |
|
weight_decay=0.1, |
|
learning_rate=1e-05, |
|
fp16 = True, |
|
evaluation_strategy="epoch", |
|
num_train_epochs=5, |
|
per_device_train_batch_size=16, |
|
overwrite_output_dir=True, |
|
per_device_eval_batch_size=16, |
|
save_strategy="no", |
|
logging_dir='logs', |
|
logging_strategy= 'steps', |
|
logging_steps=10, |
|
push_to_hub=True, |
|
hub_strategy="end") |
|
``` |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1-micro | F1-macro | F1-weighted | Precision | Recall | |
|
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:--------:|:-----------:|:---------:|:------:| |
|
| 0.9154 | 1.0 | 1812 | 0.8984 | 0.6785 | 0.6785 | 0.6383 | 0.6772 | 0.6785 | 0.6785 | |
|
| 0.8374 | 2.0 | 3624 | 0.8569 | 0.6957 | 0.6957 | 0.6529 | 0.6914 | 0.6957 | 0.6957 | |
|
| 0.7053 | 3.0 | 5436 | 0.8582 | 0.7019 | 0.7019 | 0.6594 | 0.6967 | 0.7019 | 0.7019 | |
|
| 0.7178 | 4.0 | 7248 | 0.8488 | 0.7030 | 0.7030 | 0.6662 | 0.7011 | 0.7030 | 0.7030 | |
|
| 0.6688 | 5.0 | 9060 | 0.8549 | 0.7059 | 0.7059 | 0.6683 | 0.7033 | 0.7059 | 0.7059 | |
|
|
|
### Validation evaluation |
|
|
|
| Model | Micro F1-Score | Macro F1-Score | Weighted F1-Score | |
|
|----------------|----------------|----------------|-------------------| |
|
| PolicyBERTa-7d | 0.71 | 0.67 | 0.70 | |
|
|
|
|
|
|
|
### Test evaluation |
|
|
|
| Model | Micro F1-Score | Macro F1-Score | Weighted F1-Score | |
|
|----------------|----------------|----------------|-------------------| |
|
| PolicyBERTa-7d | 0.65 | 0.60 | 0.65 | |
|
|
|
|
|
### Evaluation per category |
|
|
|
| Label | Validation F1-Score | Test F1-Score | |
|
|-----------------------------|---------------------|---------------| |
|
| external relations | 0.76 | 0.70 | |
|
| freedom and democracy | 0.61 | 0.55 | |
|
| political system | 0.55 | 0.55 | |
|
| economy | 0.74 | 0.67 | |
|
| welfare and quality of life | 0.77 | 0.72 | |
|
| fabric of society | 0.67 | 0.60 | |
|
| social groups | 0.58 | 0.41 | |
|
|
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.16.2 |
|
- Pytorch 1.9.0+cu102 |
|
- Datasets 1.8.0 |
|
- Tokenizers 0.10.3 |
|
|