--- license: apache-2.0 language: - bs - hr - sr - sl - sk - cs - en tags: - sentiment-analysis - text-regression - text-classification - sentiment-regression - sentiment-classification - parliament inference: false --- # Multilingual parliament sentiment regression model XLM-R-Parla-Sent This model is based on [xlm-r-parla](https://huggingface.co/classla/xlm-r-parla), an XLM-R-large model additionally pre-trained on parliamentary proceedings, and fine-tuned on manually annotated sentiment datasets from Bosnia and Herzegovina, Croatia, Czechia, Serbia, Slovakia, Slovenia, and the United Kingdom. Both the additionally pre-trained model, as the training dataset are results of the [ParlaMint project](https://www.clarin.eu/parlamint). The details on the models and the dataset are described in the following publication (to be published soon): Michal Mochtak, Peter Rupnik, Nikola Ljubešić: The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings. ## Annotation schema The discrete labels, present in the original dataset, were mapped to integers as follows: ``` "Negative": 0.0, "M_Negative": 1.0, "N_Neutral": 2.0, "P_Neutral": 3.0, "M_Positive": 4.0, "Positive": 5.0, ``` The model was then fine-tuned on numeric labels and set up as a regressor. ## Finetuning procedure The fine-tuning procedure is described in the pending paper. Presumed optimal hyperparameters used are ``` num_train_epochs=4, train_batch_size=32, learning_rate=8e-6, regression=True ``` ## Results Results reported were obtained from 5 fine-tuning runs. test dataset | R^2 | MAE --- | --- | --- BCS | 0.6146 ± 0.0104 | 0.7050 ± 0.0089 EN | 0.6722 ± 0.0100 | 0.6755 ± 0.0076 ## Usage Example With `simpletransformers==0.64.3`. ```python from simpletransformers.classification import ClassificationModel, ClassificationArgs import torch model_args = ClassificationArgs( regression=True, ) model = ClassificationModel(model_type="xlmroberta", model_name="classla/xlm-r-parlasent",use_cuda=torch.cuda.is_available(), num_labels=1,args=model_args) model.predict(["""Poštovani potpredsjedničke Vlade i ministre hrvatskih branitelja, mislite li da ste zapravo iznevjerili svoje suborce s kojima ste 555 dana prosvjedovali u šatoru protiv tadašnjih dužnosnika jer ste zapravo donijeli zakon koji je neprovediv, a birali ste si suradnike koji nemaju etički integritet."""]) ``` Output: ``` (array(-0.0847168), array(-0.0847168))```