Commit
·
1d8f63b
1
Parent(s):
385b8eb
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Sentiment Analysis of English Tweets
|
2 |
+
|
3 |
+
**BERTsent**: A finetuned **BERT** based **sent**iment classifier for English language tweets.
|
4 |
+
|
5 |
+
BERTsent is trained with SemEval 2017 corpus (39k plus tweets) and is based on [bertweet-base](https://github.com/VinAIResearch/BERTweet) that was trained on 850M English Tweets (cased) and additional 23M COVID-19 English Tweets (cased). The base model used [RoBERTa](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.md) pre-training procedure.
|
6 |
+
|
7 |
+
Output labels:
|
8 |
+
|
9 |
+
- 0 represents "negative" sentiment
|
10 |
+
- 1 represents "neutral" sentiment
|
11 |
+
- 2 represents "positive" sentiment
|
12 |
+
|
13 |
+
## Using the model
|
14 |
+
|
15 |
+
Install transformers, if already not installed:
|
16 |
+
|
17 |
+
terminal: pip install transformers
|
18 |
+
notebooks (Colab, Kaggle): !pip install transformers
|
19 |
+
|
20 |
+
Import BERTsent from the transformers library:
|
21 |
+
|
22 |
+
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
|
23 |
+
|
24 |
+
tokenizer = AutoTokenizer.from_pretrained("rabindralamsal/finetuned-bertweet-sentiment-analysis")
|
25 |
+
|
26 |
+
model = TFAutoModelForSequenceClassification.from_pretrained("rabindralamsal/finetuned-bertweet-sentiment-analysis")
|
27 |
+
|
28 |
+
Import TensorFlow:
|
29 |
+
|
30 |
+
import tensorflow as tf
|
31 |
+
|
32 |
+
We have installed and imported everything that's needed for the sentiment analysis. Let's predict sentiment of an example tweet:
|
33 |
+
|
34 |
+
example_tweet = "The NEET exams show our Govt in a poor light: unresponsiveness to genuine concerns; admit cards not delivered to aspirants in time; failure to provide centres in towns they reside, thus requiring unnecessary & risky travels. What a disgrace to treat our #Covid warriors like this!"
|
35 |
+
#this tweet resides on Twitter with an identifier-1435793872588738560
|
36 |
+
|
37 |
+
input = tokenizer.encode(example_tweet, return_tensors="tf")
|
38 |
+
output = model.predict(input)[0]
|
39 |
+
prediction = tf.nn.softmax(output, axis=1).numpy()
|
40 |
+
sentiment = np.argmax(prediction)
|
41 |
+
|
42 |
+
print(prediction)
|
43 |
+
print(sentiment)
|
44 |
+
|
45 |
+
Output:
|
46 |
+
|
47 |
+
[[0.9862386 0.01050556 0.00325586]]
|
48 |
+
0
|