|
--- |
|
license: apache-2.0 |
|
language: |
|
- bs |
|
- hr |
|
- sr |
|
- sl |
|
- sk |
|
- cs |
|
- en |
|
tags: |
|
- sentiment-analysis |
|
- text-regression |
|
- text-classification |
|
- sentiment-regression |
|
- sentiment-classification |
|
- parliament |
|
inference: false |
|
--- |
|
|
|
|
|
# Multilingual parliament sentiment regression model XLM-R-Parla-Sent |
|
|
|
This model is based on [xlm-r-parla](https://huggingface.co/classla/xlm-r-parla), an XLM-R-large model additionally pre-trained on parliamentary proceedings. The model was fine-tuned on the [ParlaSent dataset](http://hdl.handle.net/11356/1868), a manually annotated selection of sentences of parliamentary proceedings from Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom. |
|
|
|
Both the additionally pre-trained model, as the training dataset are results of the [ParlaMint project](https://www.clarin.eu/parlamint). The details on the models and the dataset are described in the following publication (to be published soon): |
|
|
|
Michal Mochtak, Peter Rupnik, Nikola Ljubešić: The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings. |
|
|
|
## Annotation schema |
|
|
|
The discrete labels, present in the original dataset, were mapped to integers as follows: |
|
|
|
``` |
|
"Negative": 0.0, |
|
"M_Negative": 1.0, |
|
"N_Neutral": 2.0, |
|
"P_Neutral": 3.0, |
|
"M_Positive": 4.0, |
|
"Positive": 5.0, |
|
``` |
|
The model was then fine-tuned on numeric labels and set up as a regressor. |
|
|
|
## Finetuning procedure |
|
|
|
The fine-tuning procedure is described in the pending paper. Presumed optimal hyperparameters used are |
|
``` |
|
num_train_epochs=4, |
|
train_batch_size=32, |
|
learning_rate=8e-6, |
|
regression=True |
|
``` |
|
|
|
## Results |
|
|
|
Results reported were obtained from 5 fine-tuning runs. |
|
|
|
test dataset | R^2 | MAE |
|
--- | --- | --- |
|
BCS | 0.6146 ± 0.0104 | 0.7050 ± 0.0089 |
|
EN | 0.6722 ± 0.0100 | 0.6755 ± 0.0076 |
|
|
|
## Usage Example |
|
|
|
With `simpletransformers==0.64.3`. |
|
```python |
|
from simpletransformers.classification import ClassificationModel, ClassificationArgs |
|
import torch |
|
model_args = ClassificationArgs( |
|
regression=True, |
|
) |
|
model = ClassificationModel(model_type="xlmroberta", model_name="classla/xlm-r-parlasent",use_cuda=torch.cuda.is_available(), num_labels=1,args=model_args) |
|
model.predict(["I fully disagree with this argument.", "The ministers are entering the chamber.", "Things can always be improved in the future.", "These are great news."]) |
|
``` |
|
|
|
Output: |
|
```python |
|
( |
|
array([0.11633301, 3.63671875, 4.203125 , 5.30859375]), |
|
array([0.11633301, 3.63671875, 4.203125 , 5.30859375]) |
|
) |
|
``` |