DistilBERT Fine-Tuned for Named Entity Recognition (NER)

This repository contains a DistilBERT model fine-tuned for Named Entity Recognition (NER). The model has been trained to identify and classify named entities such as names of people, places, organizations, and dates in text.

Model Details

Model: DistilBERT
Task: Named Entity Recognition (NER)
Training Dataset: Custom dataset
Evaluation Metrics: Precision, Recall, F1-Score, Accuracy

Usage

You can use this model with the Hugging Face transformers library to perform NER on your text data. Below are examples of how to use the model and tokenizer.

Installation

First, make sure you have the transformers library installed:

pip install transformers

Load the Model

from transformers import pipeline

# Load the model and tokenizer
token_classifier = pipeline(
    "token-classification", 
    model="cxx5208/NER_finetuned", 
    tokenizer="cxx5208/NER_finetuned",
    aggregation_strategy="simple"
)

# Example text
text = "My name is Yeshvanth Raju Kurapati. I study at San Jose State University"

# Perform NER
entities = token_classifier(text)
print(entities)

Example Output

[
{'entity_group': 'PER',
  'score': 0.99808735,
  'word': 'Yeshvanth Raju Kurapati',
  'start': 11,
  'end': 34},
 {'entity_group': 'ORG',
  'score': 0.9923826,
  'word': 'San Jose State University',
  'start': 47,
  'end': 72}
]

Training Details

The model was fine-tuned using the following hyperparameters:

Batch Size: 16
Learning Rate: 5e-5
Epochs: 3
Optimizer: AdamW

The training process involved using a standard NER dataset (e.g., CoNLL-2003) and included steps for tokenization, data preprocessing, and evaluation.

Evaluation

The model was evaluated using precision, recall, F1-score, and accuracy metrics. The performance metrics are as follows:

Precision: 0.952
Recall: 0.948
F1-Score: 0.950
Accuracy: 0.975

About DistilBERT

DistilBERT is a smaller, faster, cheaper version of BERT developed by Hugging Face. It retains 97% of BERT’s language understanding capabilities while being 60% faster and 40% smaller.

License

This model is released under the MIT License.

Acknowledgements

Hugging Face for the transformers library and DistilBERT model.
The authors of the original dataset used for training.