Model fine-tuned from roberta-large for sentiment classification of financial news (emphasis on Canadian news).
Introduction
This model was train on financial_news_sentiment_mixte_with_phrasebank_75 dataset. This is a customized version of the phrasebank dataset in which I kept only sentence validated by at least 75% annotators. In addition I added ~2000 articles validated manually on Canadian financial news. Therefore the model is more specifically trained for Canadian news. Final result is f1 score of 93.25% overall and 83.6% on Canadian news.
Training data
Training data was classified as follow:
class | Description |
---|---|
0 | negative |
1 | neutral |
2 | positive |
How to use roberta-large-financial-news-sentiment-en with HuggingFace
Load roberta-large-financial-news-sentiment-en and its sub-word tokenizer :
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")
model = AutoModelForSequenceClassification.from_pretrained("Jean-Baptiste/roberta-large-financial-news-sentiment-en")
##### Process text sample (from wikipedia)
from transformers import pipeline
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
pipe("Melcor REIT (TSX: MR.UN) today announced results for the third quarter ended September 30, 2022. Revenue was stable in the quarter and year-to-date. Net operating income was down 3% in the quarter at $11.61 million due to the timing of operating expenses and inflated costs including utilities like gas/heat and power")
[{'label': 'negative', 'score': 0.9399105906486511}]
Model performances
Overall f1 score (average macro)
precision | recall | f1 |
---|---|---|
0.9355 | 0.9299 | 0.9325 |
By entity
entity | precision | recall | f1 |
---|---|---|---|
negative | 0.9605 | 0.9240 | 0.9419 |
neutral | 0.9538 | 0.9459 | 0.9498 |
positive | 0.8922 | 0.9200 | 0.9059 |
- Downloads last month
- 1,128
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.