|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- ltg/norec |
|
language: |
|
- 'no' |
|
pipeline_tag: token-classification |
|
|
|
|
|
model-index: |
|
- name: SSA-Perin |
|
results: |
|
- task: |
|
type: structured sentiment analysis |
|
dataset: |
|
name: NoReC |
|
type: NoReC |
|
metrics: |
|
- name: Unlabeled sentiment tuple F1 |
|
type: Unlabeled sentiment tuple F1 |
|
value: 44.12% |
|
- name: Target F1 |
|
type: Target F1 |
|
value: 56.44% |
|
- name: Relative polarity precision |
|
type: Relative polarity precision |
|
value: 93.19% |
|
--- |
|
|
|
# Model Card for SSA-PERIN for Norwegian |
|
|
|
|
|
## Model Details |
|
|
|
We here release a pretrained model (and an easy-to-run wrapper) for structured sentiment analysis (SSA) of Norwegian text, trained on the [NoReC_fine](https://github.com/ltgoslo/norec_fine) dataset. It implements a method described in the paper [Direct parsing to sentiment graphs](https://aclanthology.org/2022.acl-short.51/) by Samuel et al. 2022 which demonstrated how a graph-based semantic parser (PERIN) can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text. |
|
|
|
|
|
### Model Description |
|
|
|
- **Developed by:** The [SANT](https://www.mn.uio.no/ifi/english/research/projects/sant/) project (Sentiment Analysis for Norwegian Text) at [the Language Technology Group](https://www.mn.uio.no/ifi/english/research/groups/ltg/) (LTG) at the University of Oslo. |
|
- **Funded by:** [SANT](https://www.mn.uio.no/ifi/english/research/projects/sant/) is funded by the Research Council of Norway |
|
- **Language(s):** Norwegian (Bokmål/Nynorsk) |
|
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|
### Model Sources |
|
|
|
- **Paper:** [Direct parsing to sentiment graphs](https://aclanthology.org/2022.acl-short.51/) by Samuel et al. published at ACL 2022 |
|
- **Repository:** The scripts used for training can be found on the [github](https://github.com/jerbarnes/direct_parsing_to_sent_graph) repository accompanying the paper of Samuel et al. (2022) above. |
|
- **Demo:** To see a demo of how it works, you can try the model in our [Hugging Face Space](https://huggingface.co/spaces/ltg/ssa-perin). |
|
- **Limitations** The training data is based on professional reviews covering multiple domains, but the model may not necessarily generalize to other text types or domains. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
The model will attempt to identify the following components for a given sentence it deems to be sentiment-bearing: _source expressions_ (the opinion holder), _target expressions_ (what the opinion is directed towards), _polar expressions_ (the part of the text indicating that an opinion is expressed), and finally the _polarity_ (positive or negative). For more information about how these categories are defined in the training data, please see the paper [A Fine-grained Sentiment Dataset for Norwegian](https://aclanthology.org/2020.lrec-1.618/) by Øvrelid et al. 2020. For each identified expression, the character offsets in the text are also provided. |
|
|
|
Here is an example showing how to use the model for predicting such sentiment tuples: |
|
|
|
```python |
|
>>> import model_wrapper |
|
>>> model = model_wrapper.PredictionModel() |
|
>>> model.predict(['vi liker svart kaffe']) |
|
[{'sent_id': '0', |
|
'text': 'vi liker svart kaffe', |
|
'opinions': [{'Source': [['vi'], ['0:2']], |
|
'Target': [['svart', 'kaffe'], ['9:14', '15:20']], |
|
'Polar_expression': [['liker'], ['3:8']], |
|
'Polarity': 'Positive'}]}] |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model is trained on [NoReC_fine](https://github.com/ltgoslo/norec_fine), a dataset for fine-grained sentiment analysis in Norwegian, based on a subset of documents from the [Norwegian Review Corpus](https://huggingface.co/datasets/ltg/norec) (NoReC) which constists of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more. |
|
|
|
- **Paper:** [A Fine-grained Sentiment Dataset for Norwegian](https://aclanthology.org/2020.lrec-1.618/) by L. Øvrelid, P. Mæhlum, J. Barnes, and E Velldal, in the Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, 2020 |
|
- **Repository:** [https://github.com/ltgoslo/norec_fine](https://github.com/ltgoslo/norec_fine) |
|
|
|
|
|
### Model Configuration and Training Hyperparameters |
|
|
|
The method proposed by Samuel et al. (2022) suggests three different ways to encode sentiment graphs: "node-centric", "labeled-edge", and "opinion-tuple". |
|
The model released here uses the following configuration: |
|
- "labeled-edge" graph encoding, |
|
- no character-level embeddings, |
|
- all other hyperparameters are set to [default values](https://github.com/jerbarnes/direct_parsing_to_sent_graph/blob/main/perin/config/edge_norec.yaml), |
|
- trained on top of underlying masked language model [NorBERT 2](https://huggingface.co/ltg/norbert2). |
|
|
|
## Evaluation |
|
|
|
The model achieves the following results on the held-out test set of NoReC_fine (see the paper for description the metrics): |
|
|
|
- Unlabeled sentiment tuple F1: 0.434 |
|
- Target F1: 0.541 |
|
- Relative polarity precision: 0.926 |
|
|
|
|
|
## Citation |
|
|
|
If you use this model in your academic work, please quote the following paper: |
|
```bibtex |
|
@inproceedings{samuel2022, |
|
title={Direct parsing to sentiment graphs}, |
|
author={David Samuel and Jeremy Barnes and Robin Kurtz and |
|
Stephan Oepen and Lilja Øvrelid and Erik Velldal}, |
|
year={2022}, |
|
booktitle = "Proceedings of the 60th Annual Meeting of |
|
the Association for Computational Linguistics", |
|
address = "Dublin, Ireland" |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
Erik Velldal and Larisa Kolesnichenko |
|
|