Text Classification
Transformers
PyTorch
English
roberta
fill-mask
finance
Inference Endpoints
SUFEHeisenberg commited on
Commit
e4079f7
1 Parent(s): 9bb9651

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - financial_phrasebank
5
+ - pauri32/fiqa-2018
6
+ - zeroshot/twitter-financial-news-sentiment
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ pipeline_tag: text-classification
12
+ tags:
13
+ - finance
14
  ---
15
+
16
+
17
+ We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide
18
+ and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.
19
+
20
+ Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:
21
+
22
+ - Financial Terms
23
+ - [Investopedia's Financia terms dictionary](https://www.investopedia.com/financial-term-dictionary-4769738)
24
+ - [NYSSCPA's accounting terminology guide](https://www.nysscpa.org/professional-resources/accounting-terminology-guide)
25
+ - [Harvey's Hypertextual Finance Glossary](https://people.duke.edu/~charvey/Classes/wpg/glossary.htm)
26
+ - Financial Datasets
27
+ - [FPB](https://huggingface.co/datasets/financial_phrasebank)
28
+ - [FiQA SA](https://huggingface.co/datasets/pauri32/fiqa-2018)
29
+ - [SemEval2017 Task5](https://aclanthology.org/S17-2089/)
30
+ - [Twitter Financial News Sentiment](https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment)
31
+ - Earnings Call
32
+ 2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.
33
+
34
+
35
+ In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:
36
+
37
+ 1. Masking Probability: 0.4 (instead of default 0.15)
38
+ 2. Warmup Steps: 0 (deriving better results than models with warmup steps)
39
+ 3. Epochs: 1 (is enough in case of overfitting)
40
+ 4. weight_decay: 0.01
41
+ 5. Train Batch Size: 64
42
+ 6. FP16
43
+