Stereotype detection at aequa-tech

cite this work

@inproceedings{arthur2023debunker,
  title={Debunker Assistant: a support for detecting online misinformation},
  author={Arthur, Thomas Edward Capozzi Lupi and Cignarella, Alessandra Teresa and Frenda, Simona and Lai, Mirko and Stranisci, Marco Antonio and Urbinati, Alessandra and others},
  booktitle={Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)},
  volume={3596},
  pages={1--5},
  year={2023},
  organization={Federico Boschetti, Gianluca E. Lebani, Bernardo Magnini, Nicole Novielli}
}

Model Description

Developed by: aequa-tech
Funded by: NGI-Search
Language(s) (NLP): Italian
License: apache-2.0
Finetuned from model: AlBERTo

This model is a fine-tuned version of AlBERTo Italian model on stereotypes detection

Training Details

Training Data

HaSpeeDe 2020
Sarcastic Hate Speech dataset
Racial stereotypes corpus available upon request to the authors of A Multilingual Dataset of Racial Stereotypes in Social Media Conversational Threads
Debunker-Assistant corpus

Training Hyperparameters

learning_rate: 2e-5
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam

Evaluation

Testing Data

It was tested on HaSpeeDe test sets (tweets and news headlines) obtaining the following results:

Metrics and Results

Tweets:

macro F1: 0.75
accuracy: 0.75
precision of positive class: 0.66
recall of positive class: 0.94
F1 of positive class: 0.78

News Headlines:

macro F1: 0.72
accuracy: 0.77
precision of positive class: 0.73
recall of positive class: 0.52
F1 of positive class: 0.61

Framework versions

Transformers 4.30.2
Pytorch 2.1.2
Datasets 2.19.0
Accelerate 0.30.0

How to use this model:

model = AutoModelForSequenceClassification.from_pretrained('aequa-tech/stereotype-it',num_labels=2) 
tokenizer = AutoTokenizer.from_pretrained("m-polignano-uniba/bert_uncased_L-12_H-768_A-12_italian_alb3rt0") 
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
classifier("text")