File size: 3,210 Bytes
51ec3c2 0c27708 51ec3c2 0c27708 51ec3c2 0c27708 51ec3c2 0c27708 51ec3c2 0c27708 51ec3c2 fb8bcc3 51ec3c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
license: mit
tags:
- sentiment analysis
- financial sentiment analysis
- bert
- text-classification
- finance
- finbert
- financial
---
# Trading Hero Financial Sentiment Analysis
Model Description: This model is a fine-tuned version of [FinBERT](https://huggingface.co/yiyanghkust/finbert-pretrain), a BERT model pre-trained on financial texts. The fine-tuning process was conducted to adapt the model to specific financial NLP tasks, enhancing its performance on domain-specific applications for sentiment analysis.
## Model Use
Primary Users: Financial analysts, NLP researchers, and developers working on financial data.
## Training Data
Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows:
Training Set: 10,918,272 tokens
Validation Set: 1,213,184 tokens
Test Set: 1,347,968 tokens
Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including:
Corporate Reports (10-K & 10-Q): 2.5 billion tokens
Earnings Call Transcripts: 1.3 billion tokens
Analyst Reports: 1.1 billion tokens
## Evaluation
* Test Accuracy = 0.908469
* Test Precision = 0.927788
* Test Recall = 0.908469
* Test F1 = 0.913267
* **Labels**: 0 -> Neutral; 1 -> Positive; 2 -> Negative
## Usage
```
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
tokenizer = AutoTokenizer.from_pretrained("fuchenru/Trading-Hero-LLM")
model = AutoModelForSequenceClassification.from_pretrained("fuchenru/Trading-Hero-LLM")
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Preprocess the input text
def preprocess(text, tokenizer, max_length=128):
inputs = tokenizer(text, truncation=True, padding='max_length', max_length=max_length, return_tensors='pt')
return inputs
# Function to perform prediction
def predict_sentiment(input_text):
# Tokenize the input text
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True)
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
# Get predicted label
predicted_label = torch.argmax(outputs.logits, dim=1).item()
# Map the predicted label to the original labels
label_map = {0: 'neutral', 1: 'positive', 2: 'negative'}
predicted_sentiment = label_map[predicted_label]
return predicted_sentiment
stock_news = [
"Market analysts predict a stable outlook for the coming weeks.",
"The market remained relatively flat today, with minimal movement in stock prices.",
"Investor sentiment improved following news of a potential trade deal.",
.......
]
for i in stock_news:
predicted_sentiment = predict_sentiment(i)
print("Predicted Sentiment:", predicted_sentiment)
```
```
Predicted Sentiment: neutral
Predicted Sentiment: neutral
Predicted Sentiment: positive
```
## Citation
```
@misc{yang2020finbert,
title={FinBERT: A Pretrained Language Model for Financial Communications},
author={Yi Yang and Mark Christopher Siy UY and Allen Huang},
year={2020},
eprint={2006.08097},
archivePrefix={arXiv},
}
``` |