File size: 3,132 Bytes
ad1b519 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
---
datasets:
- cardiffnlp/tweet_topic_multi
metrics:
- f1
- accuracy
model-index:
- name: cardiffnlp/roberta-large-tweet-topic-multi-2020
results:
- task:
type: text-classification
name: Text Classification
dataset:
name: cardiffnlp/tweet_topic_multi
type: cardiffnlp/tweet_topic_multi
args: cardiffnlp/tweet_topic_multi
split: test_2021
metrics:
- name: F1
type: f1
value: 0.7323655694132079
- name: F1 (macro)
type: f1_macro
value: 0.5794562917377284
- name: Accuracy
type: accuracy
value: 0.4937462775461584
pipeline_tag: text-classification
widget:
- text: "I'm sure the {@Tampa Bay Lightning@} would’ve rather faced the Flyers but man does their experience versus the Blue Jackets this year and last help them a lot versus this Islanders team. Another meat grinder upcoming for the good guys"
example_title: "Example 1"
- text: "Love to take night time bike rides at the jersey shore. Seaside Heights boardwalk. Beautiful weather. Wishing everyone a safe Labor Day weekend in the US."
example_title: "Example 2"
---
# cardiffnlp/roberta-large-tweet-topic-multi-2020
This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on the [tweet_topic_multi](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi). This model is fine-tuned on `train_2020` split and validated on `test_2021` split of tweet_topic.
Fine-tuning script can be found [here](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi/blob/main/lm_finetuning.py). It achieves the following results on the test_2021 set:
- F1 (micro): 0.7323655694132079
- F1 (macro): 0.5794562917377284
- Accuracy: 0.4937462775461584
### Usage
```python
import math
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
def sigmoid(x):
return 1 / (1 + math.exp(-x))
tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020")
model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/roberta-large-tweet-topic-multi-2020", problem_type="multi_label_classification")
model.eval()
class_mapping = model.config.id2label
with torch.no_grad():
text = #NewVideo Cray Dollas- Water- Ft. Charlie Rose- (Official Music Video)- {{URL}} via {@YouTube@} #watchandlearn {{USERNAME}}
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
flags = [sigmoid(s) > 0.5 for s in output[0][0].detach().tolist()]
topic = [class_mapping[n] for n, i in enumerate(flags) if i]
print(topic)
```
### Reference
```
@inproceedings{dimosthenis-etal-2022-twitter,
title = "{T}witter {T}opic {C}lassification",
author = "Antypas, Dimosthenis and
Ushio, Asahi and
Camacho-Collados, Jose and
Neves, Leonardo and
Silva, Vitor and
Barbieri, Francesco",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics"
}
```
|