cardiffnlp
/

tweet-topic-19-single

Text Classification

Inference Endpoints

Model card Files Files and versions Community

antypasd commited on Jun 9, 2022

Commit

7d64ac4

•

1 Parent(s): 4d41513

Create README.md

Files changed (1) hide show

README.md +54 -0

README.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# tweet-topic-19-single
+This is a roBERTa-base model trained on ~90m tweets until the end of 2019 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m)), and finetuned for single-label topic classification on a corpus of 6,997 tweets.
+The original roBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2019-90m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
+- Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
+- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
+<b>Labels</b>:
+0 -> arts_&_culture
+1 -> business_&_entrepreneurs
+2 -> pop_culture
+3 -> daily_life
+4 -> sports_&_gaming
+5 -> science_&_technology
+## Full classification example
+```python
+from transformers import AutoModelForSequenceClassification
+from transformers import AutoTokenizer
+import numpy as np
+from scipy.special import softmax
+MODEL = f"antypasd/tweet-topic-19-single"
+tokenizer = AutoTokenizer.from_pretrained(MODEL)
+# PT
+model = AutoModelForSequenceClassification.from_pretrained(MODEL)
+class_mapping = model.config.id2label
+text = "Tesla stock is on the rise!"
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+output = model(**encoded_input)
+scores = output[0][0].detach().numpy()
+scores = softmax(scores)
+ranking = np.argsort(scores)
+ranking = ranking[::-1]
+for i in range(scores.shape[0]):
+    l = class_mapping[ranking[i]]
+    s = scores[ranking[i]]
+    print(f"{i+1}) {l} {np.round(float(s), 4)}")
+```
+Output:
+```
+1) business_&_entrepreneurs 0.8575
+2) science_&_technology 0.0604
+3) pop_culture 0.0295
+4) daily_life 0.0217
+5) sports_&_gaming 0.0154
+6) arts_&_culture 0.0154
+```