Model: covid-19-vaccination-tweet-stance

Overview

This model is a text classifier trained to determine the stance of a tweet towards the COVID-19 vaccination. It is designed to classify tweets into three categories: in-favor, against, and neutral-or-unclear. Note that this classifier only works for tweet that is related to COVID-19 vaccination. For classifying whether a tweet is related to COVID-19 vaccination or not, please refer to covid-19-vaccination-tweet-relevance.

Usage

tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-stance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-stance")

Training corpus

The training corpus consists of 5000 tweets, randomly sampled daily from December 2020 to June 2022. These tweets were labeled by domain experts. These tweets are all related to COVID-19 vaccination.

We have seperated trained another model for classifying whether a tweet is related to COVID-19 vaccination or not. Please refer to covid-19-vaccination-tweet-relevance for more information.

Output Label Index

  • LABEL_0: "neutral-or-unclear"
  • LABEL_1: "in-favor"
  • LABEL_2: "against"

Performance Metrics

The model's performance metrics on the test set are as follows:

  • Accuracy: 0.7747
  • Macro-average metrics (across "in-favor" and "against" classes):
    • F1-score: 0.8288
    • Recall: 0.8
    • Precision: 0.86135
  • Macro-average metrics (across all 3 classes):
    • F1-score: 0.7408
    • Recall: 0.7568
    • Precision: 0.7369
  • Class-wise metrics:
    • For class "in-favor":
      • F1-score: 0.8423
      • Precision: 0.9022
      • Recall: 0.7899
    • For class "against":
      • F1-score: 0.8153
      • Precision: 0.8205
      • Recall: 0.8101
    • For class "neutral-or-unclear":
      • F1-score: 0.5648
      • Precision: 0.488
      • Recall: 0.6703

These metrics are based on a test set with a total size of 506 samples.

Note: Because the performance on the "neutral-or-unclear" class is significantly worse than the other two classes, we recommend users to exercise caution when interpreting the label of this "neutral-or-unclear" class. If you are only interested in either the "in-favor" or "against" classes, you can frame this as a binary classification problem and combine the "neutral-or-unclear" class with either the "in-favor" or "against" class.

Confusion Matrix

The confusion matrix of predictions on the test set is as follows:

Predicted: neutral-or-unclear Predicted: in-favor Predicted: against
True: neutral-or-unclear 61 16 14
True: in-favor 40 203 14
True: against 24 6 128

Model Architecture

The model is fine-tuned based on COVID-Twitter-BERT v2.

Contact

Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.