lighteternal's picture
Update README.md
bf405d9
metadata
language:
  - en
  - el
  - multilingual
tags:
  - text-classification
  - fact-or-opinion
  - transformers
widget:
  - text: Ξεχωρίζει η καθηλωτική ερμηνεία του πρωταγωνιστή.
  - text: Η Ελλάδα είναι χώρα της Ευρώπης.
  - text: Tolkien was an English writer
  - text: Tolkien is my favorite writer.
pipeline_tag: text-classification
license: apache-2.0

Fact vs. opinion binary classifier, trained on a mixed EN-EL annotated corpus.

By the Hellenic Army Academy (SSE) and the Technical University of Crete (TUC)

This is an XLM-Roberta-base model with a binary classification head. Given a sentence, it can classify it either as a fact or an opinion based on its content.

You can use this model in any of the XLM-R supported languages for the same task, taking advantage of its 0-shot learning capabilities. However, the model was trained only using English and Greek sentences.

Legend of HuggingFace API labels:

  • Label 0: Opinion/Subjective sentence
  • Label 1: Fact/Objective sentence

Dataset training info

The original dataset (available here: https://github.com/1024er/cbert_aug/tree/crayon/datasets/subj) contained aprox. 9000 annotated sentences (classified as subjective or objective). It was translated to Greek using Google Translate. The Greek version was then concatenated with the original English one to create the mixed EN-EL dataset.

The model was trained for 5 epochs, using batch size = 8. Detailed metrics and hyperparameters available on the "Metrics" tab.

Evaluation Results on test set

accuracy precision recall f1
0.952 0.945 0.960 0.952

Acknowledgement

The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the HFRI PhD Fellowship grant (Fellowship Number:50, 2nd call)