|
--- |
|
tags: |
|
- generated_from_keras_callback |
|
model-index: |
|
- name: twitter-roberta-base-sentiment-earthquake |
|
results: [] |
|
--- |
|
|
|
# twitter-roberta-base-sentiment-earthquake |
|
|
|
This is an "extension" of the `twitter-roberta-base-sentiment-latest` [model](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest), further finetuned with original Twitter data posted in English about the 10th anniversary of the 2010 Haiti Earthquake. |
|
|
|
- Reference Paper: [Sentiment analysis (SA) (supervised and unsupervised classification) of original Twitter data posted in English about the 10th anniversary of the 2010 Haiti Earthquake](https://data.ncl.ac.uk/articles/dataset/Sentiment_analysis_SA_supervised_and_unsupervised_classification_of_original_Twitter_data_posted_in_English_about_the_10th_anniversary_of_the_2010_Haiti_Earthquake/19688040/1). |
|
|
|
|
|
## Full classification example |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification |
|
from transformers import TFAutoModelForSequenceClassification |
|
from transformers import AutoTokenizer |
|
import numpy as np |
|
|
|
class_mapping = {0: "Negative", 1: "Neutral", 2: "Positive"} |
|
|
|
MODEL = "antypasd/twitter-roberta-base-sentiment-earthquake" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL) |
|
|
|
# PT |
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL) |
|
model.save_pretrained(MODEL) |
|
|
|
text = "$202 million of $1.14 billion in United States (US) recovery aid went to a new 'industrial park' in Caracol, an area unaffected by the Haiti earthquake. The plan was to invite foreign garment companies to take advantage of extremely low-wage labor" |
|
encoded_input = tokenizer(text, return_tensors='pt') |
|
output = model(**encoded_input) |
|
scores = output[0][0].detach().numpy() |
|
prediction = np.argmax(scores) |
|
|
|
|
|
# # TF |
|
# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) |
|
# model.save_pretrained(MODEL) |
|
|
|
# encoded_input = tokenizer(text, return_tensors='tf') |
|
# output = model(encoded_input) |
|
# scores = output[0][0].numpy() |
|
# prediction = np.argmax(scores) |
|
|
|
# Print label |
|
print(class_mapping[prediction]) |
|
|
|
``` |
|
|
|
Output: |
|
|
|
``` |
|
Negative |
|
``` |
|
|
|
|