Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DistilBERT Fine-Tuned for Named Entity Recognition (NER)

Hugging Face DistilBERT NER

This repository contains a DistilBERT model fine-tuned for Named Entity Recognition (NER). The model has been trained to identify and classify named entities such as names of people, places, organizations, and dates in text.

Model Details

  • Model: DistilBERT
  • Task: Named Entity Recognition (NER)
  • Training Dataset: Custom dataset
  • Evaluation Metrics: Precision, Recall, F1-Score, Accuracy

Usage

You can use this model with the Hugging Face transformers library to perform NER on your text data. Below are examples of how to use the model and tokenizer.

Installation

First, make sure you have the transformers library installed:

pip install transformers

Load the Model

from transformers import pipeline

# Load the model and tokenizer
token_classifier = pipeline(
    "token-classification", 
    model="cxx5208/NER_finetuned", 
    tokenizer="cxx5208/NER_finetuned",
    aggregation_strategy="simple"
)

# Example text
text = "My name is Yeshvanth Raju Kurapati. I study at San Jose State University"

# Perform NER
entities = token_classifier(text)
print(entities)

Example Output

[
{'entity_group': 'PER',
  'score': 0.99808735,
  'word': 'Yeshvanth Raju Kurapati',
  'start': 11,
  'end': 34},
 {'entity_group': 'ORG',
  'score': 0.9923826,
  'word': 'San Jose State University',
  'start': 47,
  'end': 72}
]

Training Details

The model was fine-tuned using the following hyperparameters:

  • Batch Size: 16
  • Learning Rate: 5e-5
  • Epochs: 3
  • Optimizer: AdamW

The training process involved using a standard NER dataset (e.g., CoNLL-2003) and included steps for tokenization, data preprocessing, and evaluation.

Evaluation

The model was evaluated using precision, recall, F1-score, and accuracy metrics. The performance metrics are as follows:

  • Precision: 0.952
  • Recall: 0.948
  • F1-Score: 0.950
  • Accuracy: 0.975

About DistilBERT

DistilBERT is a smaller, faster, cheaper version of BERT developed by Hugging Face. It retains 97% of BERT’s language understanding capabilities while being 60% faster and 40% smaller.

License

This model is released under the MIT License.

Acknowledgements

  • Hugging Face for the transformers library and DistilBERT model.
  • The authors of the original dataset used for training.
Downloads last month
4
Safetensors
Model size
65.2M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).