antypasd commited on
Commit
5e8f546
1 Parent(s): ef395b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -31
README.md CHANGED
@@ -1,46 +1,55 @@
1
- ---
2
- tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: tf version
6
- results: []
7
- ---
8
 
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
11
 
12
- # tf version
 
13
 
14
- This model is a fine-tuned version of [antypasd/tweet-topic-21-multi](https://huggingface.co/antypasd/tweet-topic-21-multi) on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
 
17
 
18
- ## Model description
 
 
 
 
 
19
 
20
- More information needed
21
 
22
- ## Intended uses & limitations
23
 
24
- More information needed
 
 
 
 
25
 
26
- ## Training and evaluation data
 
 
27
 
28
- More information needed
 
 
29
 
30
- ## Training procedure
 
 
31
 
32
- ### Training hyperparameters
 
 
33
 
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
 
37
 
38
- ### Training results
 
39
 
40
-
41
-
42
- ### Framework versions
43
-
44
- - Transformers 4.19.2
45
- - TensorFlow 2.8.2
46
- - Tokenizers 0.12.1
 
1
+ # tweet-topic-21-multi
 
 
 
 
 
 
2
 
3
+ This is a roBERTa-base model trained on ~124M tweets from January 2018 to December 2021 (see [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m)), and finetuned for single-label topic classification on a corpus of 11,267 tweets.
4
+ The original roBERTa-base model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English.
5
 
6
+ - Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829).
7
+ - Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).
8
 
9
+ <b>Labels</b>:
 
10
 
11
 
12
+ | <span style="font-weight:normal">0: arts_&_culture</span> | <span style="font-weight:normal">5: fashion_&_style</span> | <span style="font-weight:normal">10: learning_&_educational</span> | <span style="font-weight:normal">15: science_&_technology</span> |
13
+ |-----------------------------|---------------------|----------------------------|--------------------------|
14
+ | 1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
15
+ | 2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
16
+ | 3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
17
+ | 4: family | 9: gaming | 14: relationships | |
18
 
 
19
 
20
+ ## Full classification example
21
 
22
+ ```python
23
+ from transformers import AutoModelForSequenceClassification
24
+ from transformers import AutoTokenizer
25
+ import numpy as np
26
+ from scipy.special import expit
27
 
28
+
29
+ MODEL = f"antypasd/tweet-topic-21-single"
30
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
31
 
32
+ # PT
33
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
34
+ class_mapping = model.config.id2label
35
 
36
+ text = "It is great to see athletes promoting awareness for climate change."
37
+ tokens = tokenizer(text, return_tensors='pt')
38
+ output = model(**tokens)
39
 
40
+ scores = output[0][0].detach().numpy()
41
+ scores = expit(scores)
42
+ predictions = (scores >= 0.5) * 1
43
 
44
+ # Map to classes
45
+ for i in range(len(predictions)):
46
+ if predictions[i]:
47
+ print(class_mapping[i])
48
 
49
+ ```
50
+ Output:
51
 
52
+ ```
53
+ news_&_social_concern
54
+ sports
55
+ ```