SUFEHeisenberg
/

Fin-RoBERTa

Text Classification

Inference Endpoints

Model card Files Files and versions Community

SUFEHeisenberg commited on Feb 23

Commit

e4079f7

•

1 Parent(s): 9bb9651

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -1,3 +1,43 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- financial_phrasebank
+- pauri32/fiqa-2018
+- zeroshot/twitter-financial-news-sentiment
+language:
+- en
+metrics:
+- accuracy
+pipeline_tag: text-classification
+tags:
+- finance
 ---
+We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide
+and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.
+Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:
+- Financial Terms
+  - [Investopedia's Financia terms dictionary](https://www.investopedia.com/financial-term-dictionary-4769738)
+  - [NYSSCPA's accounting terminology guide](https://www.nysscpa.org/professional-resources/accounting-terminology-guide)
+  - [Harvey's Hypertextual Finance Glossary](https://people.duke.edu/~charvey/Classes/wpg/glossary.htm)
+- Financial Datasets
+  - [FPB](https://huggingface.co/datasets/financial_phrasebank)
+  - [FiQA SA](https://huggingface.co/datasets/pauri32/fiqa-2018)
+  - [SemEval2017 Task5](https://aclanthology.org/S17-2089/)
+  - [Twitter Financial News Sentiment](https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)
+- Earnings Call
+  2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.
+In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:
+1. Masking Probability: 0.4 (instead of default 0.15)
+2. Warmup Steps: 0 (deriving better results than models with warmup steps)
+3. Epochs: 1 (is enough in case of overfitting)
+4. weight_decay: 0.01
+5. Train Batch Size: 64
+6. FP16