antypasd commited on
Commit
7d64ac4
1 Parent(s): 4d41513

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # tweet-topic-19-single
2
+
3
+ This is a roBERTa-base model trained on ~90m tweets until the end of 2019 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m)), and finetuned for single-label topic classification on a corpus of 6,997 tweets.
4
+ The original roBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
5
+
6
+ - Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
7
+ - Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
8
+
9
+ <b>Labels</b>:
10
+ 0 -> arts_&_culture
11
+ 1 -> business_&_entrepreneurs
12
+ 2 -> pop_culture
13
+ 3 -> daily_life
14
+ 4 -> sports_&_gaming
15
+ 5 -> science_&_technology
16
+
17
+
18
+ ## Full classification example
19
+
20
+ ```python
21
+ from transformers import AutoModelForSequenceClassification
22
+ from transformers import AutoTokenizer
23
+ import numpy as np
24
+ from scipy.special import softmax
25
+
26
+ MODEL = f"antypasd/tweet-topic-19-single"
27
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
28
+ # PT
29
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
30
+ class_mapping = model.config.id2label
31
+ text = "Tesla stock is on the rise!"
32
+ encoded_input = tokenizer(text, return_tensors='pt')
33
+ output = model(**encoded_input)
34
+ output = model(**encoded_input)
35
+ scores = output[0][0].detach().numpy()
36
+ scores = softmax(scores)
37
+ ranking = np.argsort(scores)
38
+ ranking = ranking[::-1]
39
+ for i in range(scores.shape[0]):
40
+ l = class_mapping[ranking[i]]
41
+ s = scores[ranking[i]]
42
+ print(f"{i+1}) {l} {np.round(float(s), 4)}")
43
+ ```
44
+
45
+ Output:
46
+
47
+ ```
48
+ 1) business_&_entrepreneurs 0.8575
49
+ 2) science_&_technology 0.0604
50
+ 3) pop_culture 0.0295
51
+ 4) daily_life 0.0217
52
+ 5) sports_&_gaming 0.0154
53
+ 6) arts_&_culture 0.0154
54
+ ```