Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Personal Identifiable Information (PII Model)

This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:

  • Training Loss: 0.003900
  • Validation Loss: 0.051071
  • Precision: 95.53%
  • Recall: 96.60%
  • F1: 96%
  • Accuracy:99.11%

Model description

Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.

Model can Detect Following Entity Group

  • ACCOUNTNUMBER
  • FIRSTNAME
  • ACCOUNTNAME
  • PHONENUMBER
  • CREDITCARDCVV
  • CREDITCARDISSUER
  • PREFIX
  • LASTNAME
  • AMOUNT
  • DATE
  • DOB
  • COMPANYNAME
  • BUILDINGNUMBER
  • STREET
  • SECONDARYADDRESS
  • STATE
  • EMAIL
  • CITY
  • CREDITCARDNUMBER
  • SSN
  • URL
  • USERNAME
  • PASSWORD
  • COUNTY
  • PIN
  • MIDDLENAME
  • IBAN
  • GENDER
  • AGE
  • ZIPCODE
  • SEX

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameter Value
Learning Rate 5e-5
Train Batch Size 16
Eval Batch Size 16
Number of Training Epochs 7
Weight Decay 0.01
Save Strategy Epoch
Load Best Model at End True
Metric for Best Model F1
Push to Hub True
Evaluation Strategy Epoch
Early Stopping Patience 3

Training results

Epoch Training Loss Validation Loss Precision (%) Recall (%) F1 Score (%) Accuracy (%)
1 0.0443 0.038108 91.88 95.17 93.50 98.80
2 0.0318 0.035728 94.13 96.15 95.13 98.90
3 0.0209 0.032016 94.81 96.42 95.61 99.01
4 0.0154 0.040221 93.87 95.80 94.82 98.88
5 0.0084 0.048183 94.21 96.06 95.13 98.93
6 0.0037 0.052281 94.49 96.60 95.53 99.07

Author

abhijeet__@outlook.com

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4,426
Safetensors
Model size
109M params
Tensor type
F32
·

Finetuned from

Evaluation results