Text Classification
Transformers
PyTorch
English
roberta
fill-mask
finance
Inference Endpoints
Fin-RoBERTa / README.md
SUFEHeisenberg's picture
Update README.md
e4079f7 verified
metadata
license: apache-2.0
datasets:
  - financial_phrasebank
  - pauri32/fiqa-2018
  - zeroshot/twitter-financial-news-sentiment
language:
  - en
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - finance

We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.

Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:

In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:

  1. Masking Probability: 0.4 (instead of default 0.15)
  2. Warmup Steps: 0 (deriving better results than models with warmup steps)
  3. Epochs: 1 (is enough in case of overfitting)
  4. weight_decay: 0.01
  5. Train Batch Size: 64
  6. FP16