|
--- |
|
license: bigscience-openrail-m |
|
widget: |
|
- text: >- |
|
We will restore funding to the Global Environment Facility and the |
|
Intergovernmental Panel on Climate Change. |
|
--- |
|
|
|
## Model description |
|
An xlm-roberta-large model fine-tuned on ~1,7 million annotated statements contained in the [Manifesto Corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2024a). |
|
The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)). |
|
It works for all languages the xlm-roberta model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages contained in the Manifesto Corpus: |
|
|
|
|||||| |
|
|------|------|------|------|------| |
|
|armenian|bosnian|bulgarian|catalan|croatian| |
|
|czech|danish|dutch|english|estonian| |
|
|finnish|french|galician|georgian|german| |
|
|greek|hebrew|hungarian|icelandic|italian| |
|
|japanese|korean|latvian|lithuanian|macedonian| |
|
|montenegrin|norwegian|polish|portuguese|romanian| |
|
|russian|serbian|slovak|slovenian|spanish| |
|
|swedish|turkish|ukrainian| | | |
|
|
|
## How to use |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1") |
|
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large") |
|
|
|
sentence = "We will restore funding to the Global Environment Facility and the Intergovernmental Panel on Climate Change, to support critical climate science research around the world" |
|
|
|
inputs = tokenizer(sentence, |
|
return_tensors="pt", |
|
max_length=200, #we limited the input to 200 tokens during finetuning |
|
padding="max_length", |
|
truncation=True |
|
) |
|
|
|
logits = model(**inputs).logits |
|
|
|
probabilities = torch.softmax(logits, dim=1).tolist()[0] |
|
probabilities = {model.config.id2label[index]: round(probability * 100, 2) for index, probability in enumerate(probabilities)} |
|
probabilities = dict(sorted(probabilities.items(), key=lambda item: item[1], reverse=True)) |
|
print(probabilities) |
|
# {'501 - Environmental Protection: Positive': 67.56, '411 - Technology and Infrastructure': 14.03, '107 - Internationalism: Positive': 13.58, '416 - Anti-Growth Economy: Positive': 2.24... |
|
|
|
predicted_class = model.config.id2label[logits.argmax().item()] |
|
print(predicted_class) |
|
# 501 - Environmental Protection: Positive |
|
``` |
|
|
|
|
|
## Model Performance |
|
|
|
The model was evaluated on a test set of 200,920 annotated manifesto statements. |
|
|
|
### Overall |
|
|
|
| | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy | |
|
|-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:| |
|
[Sentence Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2024-1-1)| 0.57 | 0.73 | 0.81 | 0.48 | 0.43 | 0.45 | 0.55| 1.47 | |
|
[Context Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-context-2024-1-1) | 0.64 | 0.81 | 0.88 | 0.55 | 0.52 | 0.53 | 0.63| 1.15 | |
|
|
|
### Citation |
|
|
|
Please cite the model as follows: |
|
|
|
Burst, Tobias / Lehmann, Pola / Franzmann, Simon / Al-Gaddooa, Denise / Ivanusch, Christoph / Regel, Sven / Riethmüller, Felicia / Weßels, Bernhard / Zehnter, Lisa (2024): manifestoberta. Version 56topics.sentence.2024.1.1. Berlin: Wissenschaftszentrum Berlin für Sozialforschung (WZB) / Göttingen: Institut für Demokratieforschung (IfDem). https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1 |
|
|
|
```bib |
|
@misc{Burst:2024, |
|
Address = {Berlin / Göttingen}, |
|
Author = {Burst, Tobias AND Lehmann, Pola AND Franzmann, Simon AND Al-Gaddooa, Denise AND Ivanusch, Christoph AND Regel, Sven AND Riethmüller, Felicia AND Weßels, Bernhard AND Zehnter, Lisa}, |
|
Publisher = {Wissenschaftszentrum Berlin für Sozialforschung / Göttinger Institut für Demokratieforschung}, |
|
Title = {manifestoberta. Version 56topics.sentence.2024.1.1}, |
|
doi = {10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1}, |
|
url = {https://doi.org/10.25522/manifesto.manifestoberta.56topics.sentence.2024.1.1}, |
|
Year = {2024}, |
|
``` |