Utilising Weak Supervision to Create S3D: A Sarcasm Annotated Dataset
This is the repository for the S3D dataset published at EMNLP 2022. The dataset can help build sarcasm detection models.
bertweet-base-finetuned-SARC-combined-DS
This model is a fine-tuned version of vinai/bertweet-base on our combined sarcasm dataset. It achieves the following results on the evaluation set:
- Loss: 1.4624
- Accuracy: 0.7611
- Precision: 0.7611
- Recall: 0.7611
- F1: 0.7611
Model description
The given description for BERTweet by VinAI is as follows:
BERTweet is the first public large-scale language model pre-trained for English Tweets. BERTweet is trained based on the RoBERTa pre-training procedure. The corpus used to pre-train BERTweet consists of 850M English Tweets (16B word tokens ~ 80GB), containing 845M Tweets streamed from 01/2012 to 08/2019 and 5M Tweets related to the COVID-19 pandemic.
Training and evaluation data
More information neededThis vinai/bertweet-base model was finetuned on our combined sarcasm dataset. This dataset was created to aid the building of sarcasm detection models
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 64
- eval_batch_size: 16
- seed: 43
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 30
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|---|
0.4319 | 4.0 | 44819 | 0.5049 | 0.7790 | 0.7796 | 0.7789 | 0.7789 |
0.2835 | 8.0 | 89638 | 0.6475 | 0.7663 | 0.7664 | 0.7663 | 0.7663 |
0.1797 | 12.0 | 134457 | 0.8746 | 0.7638 | 0.7639 | 0.7637 | 0.7637 |
0.1219 | 16.0 | 179276 | 1.0595 | 0.7585 | 0.7597 | 0.7587 | 0.7583 |
0.0905 | 20.0 | 224095 | 1.2115 | 0.7611 | 0.7612 | 0.7612 | 0.7611 |
0.0728 | 24.0 | 268914 | 1.3644 | 0.7628 | 0.7629 | 0.7627 | 0.7627 |
0.0612 | 28.0 | 313733 | 1.4624 | 0.7611 | 0.7611 | 0.7611 | 0.7611 |
Framework versions
- Transformers 4.20.1
- Pytorch 1.10.1+cu111
- Datasets 2.3.2
- Tokenizers 0.12.1
- Downloads last month
- 25
Model tree for surrey-nlp/bertweet-base-finetuned-SARC-combined-DS
Base model
vinai/bertweet-base