File size: 2,664 Bytes
e1a333e
 
 
 
 
 
5e8f546
ef395b7
d4b4aad
ef395b7
d4b4aad
ef395b7
5e8f546
ef395b7
 
5e8f546
 
 
 
 
 
ef395b7
 
5e8f546
ef395b7
5e8f546
92be862
5e8f546
 
 
ef395b7
5e8f546
afc2255
5e8f546
ef395b7
5e8f546
 
 
ef395b7
5e8f546
 
 
ef395b7
5e8f546
 
 
ef395b7
92be862
 
 
e25fdf3
92be862
 
 
 
 
 
 
5e8f546
 
 
 
ef395b7
5e8f546
 
ef395b7
5e8f546
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language: en
widget:
- text: "It is great to see athletes promoting awareness for climate change."
---

# tweet-topic-21-multi

This model is based on a [TimeLMs](https://github.com/cardiffnlp/timelms) language model trained on ~124M tweets from January 2018 to December 2021 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m)), and finetuned for multi-label topic classification on a corpus of 11,267 [tweets](https://huggingface.co/datasets/cardiffnlp/tweet_topic_multi). This model is suitable for English. 

 - Reference Paper: [TweetTopic](https://arxiv.org/abs/2209.09824) (COLING 2022). 

<b>Labels</b>: 


| <span style="font-weight:normal">0: arts_&_culture</span>           | <span style="font-weight:normal">5: fashion_&_style</span>   | <span style="font-weight:normal">10: learning_&_educational</span>  | <span style="font-weight:normal">15: science_&_technology</span>  |
|-----------------------------|---------------------|----------------------------|--------------------------|
| 1: business_&_entrepreneurs | 6: film_tv_&_video  | 11: music                  | 16: sports               |
| 2: celebrity_&_pop_culture  | 7: fitness_&_health | 12: news_&_social_concern  | 17: travel_&_adventure   |
| 3: diaries_&_daily_life     | 8: food_&_dining    | 13: other_hobbies          | 18: youth_&_student_life |
| 4: family                   | 9: gaming           | 14: relationships          |                          |


## Full classification example

```python
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import expit

    
MODEL = f"cardiffnlp/tweet-topic-21-multi"
tokenizer = AutoTokenizer.from_pretrained(MODEL)

# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label

text = "It is great to see athletes promoting awareness for climate change."
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)

scores = output[0][0].detach().numpy()
scores = expit(scores)
predictions = (scores >= 0.5) * 1


# TF
#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = tf_model.config.id2label
#text = "It is great to see athletes promoting awareness for climate change."
#tokens = tokenizer(text, return_tensors='tf')
#output = tf_model(**tokens)
#scores = output[0][0]
#scores = expit(scores)
#predictions = (scores >= 0.5) * 1

# Map to classes
for i in range(len(predictions)):
  if predictions[i]:
    print(class_mapping[i])

```
Output: 

```
news_&_social_concern
sports
```