EmTract (DistilBERT-Base-Uncased)
Model Description
emtract-distilbert-base-uncased-emotion
is a specialized model finetuned on a combination of unify-emotion-datasets, containing around 250K texts labeled across seven emotion categories: neutral, happy, sad, anger, disgust, surprise, and fear. This model was later adapted to a smaller set of 10K hand-tagged messages from StockTwits. The model is designed to excel at emotion detection in financial social media content such as that found on StockTwits.
Model parameters were as follows: sequence length of 64, learning rate of 2e-5, batch size of 128, trained for 8 epochs. For steps on how to use the model for inference, please refer to the accompanying Inference.ipynb notebook.
Training Data
The first part of the training data was obtained from the Unify Emotion Datasets available at here. The second part I obtained from social media and hand-tagged. It is available here.
Evaluation Metrics
The model was evaluated using the following metrics:
- Accuracy
- Precision
- Recall
- F1-score
Research
The underlying research for emotion extraction from financial social media can be found on: arxiv and SSRN.
Citation
Please cite the following if you use this model:
Vamossy, Domonkos F., and Rolf Skog. "EmTract: Extracting Emotions from Social Media." Available at SSRN 3975884 (2023).
BibTex citation:
@article{vamossy2023emtract,
title={EmTract: Extracting Emotions from Social Media},
author={Vamossy, Domonkos F and Skog, Rolf},
journal={Available at SSRN 3975884},
year={2023}
}
Research using EmTract
Social Media Emotions and IPO Returns
Investor Emotions and Earnings Announcements
License
This project is licensed under the terms of the MIT license.
- Downloads last month
- 1,532