Model: covid-19-vaccination-tweet-stance

Overview

This model is a text classifier trained to determine the stance of a tweet towards the COVID-19 vaccination. It is designed to classify tweets into three categories: in-favor, against, and neutral-or-unclear. Note that this classifier only works for tweet that is related to COVID-19 vaccination. For classifying whether a tweet is related to COVID-19 vaccination or not, please refer to covid-19-vaccination-tweet-relevance.

Usage

tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-stance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-stance")

Training corpus

The training corpus consists of 5000 tweets, randomly sampled daily from December 2020 to June 2022. These tweets were labeled by domain experts. These tweets are all related to COVID-19 vaccination.

We have seperated trained another model for classifying whether a tweet is related to COVID-19 vaccination or not. Please refer to covid-19-vaccination-tweet-relevance for more information.

Output Label Index

LABEL_0: "neutral-or-unclear"
LABEL_1: "in-favor"
LABEL_2: "against"

Performance Metrics

The model's performance metrics on the test set are as follows:

Accuracy: 0.7747
Macro-average metrics (across "in-favor" and "against" classes):
- F1-score: 0.8288
- Recall: 0.8
- Precision: 0.86135
Macro-average metrics (across all 3 classes):
- F1-score: 0.7408
- Recall: 0.7568
- Precision: 0.7369
Class-wise metrics:
- For class "in-favor":
  - F1-score: 0.8423
  - Precision: 0.9022
  - Recall: 0.7899
- For class "against":
  - F1-score: 0.8153
  - Precision: 0.8205
  - Recall: 0.8101
- For class "neutral-or-unclear":
  - F1-score: 0.5648
  - Precision: 0.488
  - Recall: 0.6703

These metrics are based on a test set with a total size of 506 samples.

Note: Because the performance on the "neutral-or-unclear" class is significantly worse than the other two classes, we recommend users to exercise caution when interpreting the label of this "neutral-or-unclear" class. If you are only interested in either the "in-favor" or "against" classes, you can frame this as a binary classification problem and combine the "neutral-or-unclear" class with either the "in-favor" or "against" class.

Confusion Matrix

The confusion matrix of predictions on the test set is as follows:

	Predicted: neutral-or-unclear	Predicted: in-favor	Predicted: against
True: neutral-or-unclear	61	16	14
True: in-favor	40	203	14
True: against	24	6	128

Model Architecture

The model is fine-tuned based on COVID-Twitter-BERT v2.

Contact

Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)