Model: covid-19-vaccination-tweet-relevance

Overview

This model is a text classifier trained to determine whether a tweet is related to COVID-19 vaccination or not.

Usage

tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")

Training corpus

The training corpus comprises 9373 tweets, a daily random sampled dated from December 2020 to June 2022. These tweets were labeled by domain experts.

We have seperated trained another model for classifying the stance of a tweet towards the COVID-19 vaccination. Please refer to covid-19-vaccination-tweet-stance for more information.

Output Label Index

  • LABEL_0: "irrelevance"
  • LABEL_1: "relevance"

Performance Metrics

The model's performance metrics on the test set are as follows:

  • Accuracy: 0.9386
  • Macro-average metrics:
    • F1-score: 0.9339
    • Recall: 0.9277
    • Precision: 0.9418
  • Class-wise metrics:
    • For class "relevance":
      • F1-score: 0.9161
      • Precision: 0.9523
      • Recall: 0.8825
    • For class "irrelevance":
      • F1-score: 0.9516
      • Precision: 0.9312
      • Recall: 0.973

These metrics are based on a test set with a total size of 3699 samples.

Confusion Matrix

The confusion matrix of predictions on the test set is as follows:

Predicted: irrelevance Predicted: relevance
True: irrelevance 1239 165
True: relevance 62 2233

Model Architecture

The model is fine-tuned based on COVID-Twitter-BERT v2.

Contact

Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.