--- license: bigscience-openrail-m widget: - text: >- We will restore funding to the Global Environment Facility and the Intergovernmental Panel on Climate Change. --- ## Model description An xlm-roberta-large model fine-tuned on ~1,6 million annotated statements contained in the [Manifesto Corpus](https://manifesto-project.wzb.eu/information/documents/corpus) (version 2023a). The model can be used to categorize any type of text into 56 different political topics according to the Manifesto Project's coding scheme ([Handbook 4](https://manifesto-project.wzb.eu/coding_schemes/mp_v4)). It works for all languages the xlm-roberta model is pretrained on ([overview](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr#introduction)), just note that it will perform best for the 38 languages contained in the Manifesto Corpus: |||||| |------|------|------|------|------| |armenian|bosnian|bulgarian|catalan|croatian| |czech|danish|dutch|english|estonian| |finnish|french|galician|georgian|german| |greek|hebrew|hungarian|icelandic|italian| |japanese|korean|latvian|lithuanian|macedonian| |montenegrin|norwegian|polish|portuguese|romanian| |russian|serbian|slovak|slovenian|spanish| |swedish|turkish|ukrainian| | | ## How to use ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer model = AutoModelForSequenceClassification.from_pretrained("manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2023-1-1") tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large") sentence = "We will restore funding to the Global Environment Facility and the Intergovernmental Panel on Climate Change, to support critical climate science research around the world" inputs = tokenizer(sentence, return_tensors="pt", max_length=200, #we limited the input to 200 tokens during finetuning padding="max_length", truncation=True ) logits = model(**inputs).logits probabilities = torch.softmax(logits, dim=1).tolist()[0] probabilities = {model.config.id2label[index]: round(probability * 100, 2) for index, probability in enumerate(probabilities)} probabilities = dict(sorted(probabilities.items(), key=lambda item: item[1], reverse=True)) print(probabilities) # {'501 - Environmental Protection: Positive': 67.28, '411 - Technology and Infrastructure': 15.19, '107 - Internationalism: Positive': 13.63, '416 - Anti-Growth Economy: Positive': 2.02... predicted_class = model.config.id2label[logits.argmax().item()] print(predicted_class) # 501 - Environmental Protection: Positive ``` ## Model Performance The model was evaluated on a test set of 199,046 annotated manifesto statements. ### Overall | | Accuracy | Top2_Acc | Top3_Acc | Precision| Recall | F1_Macro | MCC | Cross-Entropy | |-------------------------------------------------------------------------------------------------------|:--------:|:--------:|:--------:|:--------:|:------:|:--------:|:---:|:-------------:| [Sentence Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-sentence-2023-1-1)| 0.57 | 0.73 | 0.81 | 0.49 | 0.43 | 0.45 | 0.55| 1.5 | [Context Model](https://huggingface.co/manifesto-project/manifestoberta-xlm-roberta-56policy-topics-context-2023-1-1) | 0.64 | 0.81 | 0.88 | 0.54 | 0.52 | 0.53 | 0.62| 1.15 |