Model: covid-19-vaccination-tweet-relevance

Overview

This model is a text classifier trained to determine whether a tweet is related to COVID-19 vaccination or not.

Usage

tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")

Training corpus

The training corpus comprises 9373 tweets, a daily random sampled dated from December 2020 to June 2022. These tweets were labeled by domain experts.

We have seperated trained another model for classifying the stance of a tweet towards the COVID-19 vaccination. Please refer to covid-19-vaccination-tweet-stance for more information.

Output Label Index

LABEL_0: "irrelevance"
LABEL_1: "relevance"

Performance Metrics

The model's performance metrics on the test set are as follows:

Accuracy: 0.9386
Macro-average metrics:
- F1-score: 0.9339
- Recall: 0.9277
- Precision: 0.9418
Class-wise metrics:
- For class "relevance":
  - F1-score: 0.9161
  - Precision: 0.9523
  - Recall: 0.8825
- For class "irrelevance":
  - F1-score: 0.9516
  - Precision: 0.9312
  - Recall: 0.973

These metrics are based on a test set with a total size of 3699 samples.

Confusion Matrix

The confusion matrix of predictions on the test set is as follows:

	Predicted: irrelevance	Predicted: relevance
True: irrelevance	1239	165
True: relevance	62	2233

Model Architecture

The model is fine-tuned based on COVID-Twitter-BERT v2.

Contact

Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)