bodias's picture
Update README.md
c63546f verified
metadata
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: distilbert-base-uncased-finetuned-FiNER
    results: []
datasets:
  - nlpaueb/finer-139
language:
  - en
pipeline_tag: token-classification

distilbert-base-uncased-finetuned-FiNER

This model is a fine-tuned version of distilbert/distilbert-base-uncased trained on a subset of the nlpaueb/finer-139 dataset. The subset is generated by filtering the dataset to contain only samples with at least one of the following NER tags:

  • 'O',
  • 'B-DebtInstrumentBasisSpreadOnVariableRate1',
  • 'B-DebtInstrumentFaceAmount',
  • 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
  • 'B-DebtInstrumentInterestRateStatedPercentage'

Then, it was fine-tuned to detect only the afforementioned 4 tags (plus other "O")

It achieves the following results on the evaluation set:

  • Loss: 0.0336
  • Precision: 0.9154
  • Recall: 0.9327
  • F1: 0.9240
  • Accuracy: 0.9917

Model description

Model based on distilbert/distilbert-base-uncased with all default parameters.

Intended uses & limitations

The model published here was trained for demo purposes only.

Training and evaluation data

Original train/validation/test splits from nlpaueb/finer-139, after filtering for samples containing at least one of the following NER tags:

  • 'O',
  • 'B-DebtInstrumentBasisSpreadOnVariableRate1',
  • 'B-DebtInstrumentFaceAmount',
  • 'B-LineOfCreditFacilityMaximumBorrowingCapacity',
  • 'B-DebtInstrumentInterestRateStatedPercentage'

Training procedure

Follow information here https://github.com/bodias/DistilBERT-FiNER

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.0354 1.0 1773 0.0375 0.8639 0.8993 0.8812 0.9870
0.0242 2.0 3546 0.0296 0.8929 0.9159 0.9042 0.9895
0.0166 3.0 5319 0.0297 0.9079 0.9208 0.9143 0.9907
0.0117 4.0 7092 0.0303 0.9101 0.9293 0.9196 0.9913
0.0086 5.0 8865 0.0328 0.9065 0.9331 0.9196 0.9913
0.0062 6.0 10638 0.0336 0.9154 0.9327 0.9240 0.9917

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2