Model: covid-19-vaccination-tweet-relevance
Overview
This model is a text classifier trained to determine whether a tweet is related to COVID-19 vaccination or not.
Usage
tokenizer = AutoTokenizer.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")
model = AutoModel.from_pretrained("seantw/covid-19-vaccination-tweet-relevance")
Training corpus
The training corpus comprises 9373 tweets, a daily random sampled dated from December 2020 to June 2022. These tweets were labeled by domain experts.
We have seperated trained another model for classifying the stance of a tweet towards the COVID-19 vaccination. Please refer to covid-19-vaccination-tweet-stance for more information.
Output Label Index
- LABEL_0: "irrelevance"
- LABEL_1: "relevance"
Performance Metrics
The model's performance metrics on the test set are as follows:
- Accuracy: 0.9386
- Macro-average metrics:
- F1-score: 0.9339
- Recall: 0.9277
- Precision: 0.9418
- Class-wise metrics:
- For class "relevance":
- F1-score: 0.9161
- Precision: 0.9523
- Recall: 0.8825
- For class "irrelevance":
- F1-score: 0.9516
- Precision: 0.9312
- Recall: 0.973
- For class "relevance":
These metrics are based on a test set with a total size of 3699 samples.
Confusion Matrix
The confusion matrix of predictions on the test set is as follows:
Predicted: irrelevance | Predicted: relevance | |
---|---|---|
True: irrelevance | 1239 | 165 |
True: relevance | 62 | 2233 |
Model Architecture
The model is fine-tuned based on COVID-Twitter-BERT v2.
Contact
Sean Yun-Shiuan Chuang (yunshiuan.chuang@wisc.edu)
- Downloads last month
- 12