---
datasets:
- eriktks/conll2003
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- google-bert/bert-base-uncased
pipeline_tag: token-classification
---
# Fine-Tuned BERT Model for Named Entity Recognition (NER) with Accelerate Library

This repository contains a fine-tuned BERT model for Named Entity Recognition (NER) tasks, trained on the [CoNLL 2003 dataset](https://huggingface.co/datasets/eriktks/conll2003) using the Hugging Face Accelerate library.

The dataset includes the following labels:
- `O`, `B-PER`, `I-PER`, `B-ORG`, `I-ORG`, `B-LOC`, `I-LOC`, `B-MISC`, `I-MISC`

## Model Training Details

### Training Arguments
- **Library**: Hugging Face Accelerate
- **Model Architecture**: `bert-base-cased` for token classification
- **Learning Rate**: `2e-5`
- **Number of Epochs**: `20`
- **Weight Decay**: `0.01`
- **Batch Size**: `8`
- **Evaluation Strategy**: `epoch`
- **Save Strategy**: `epoch`

*Additional default parameters from the Accelerate and Transformers libraries were used.*

---

## Evaluation Results

### Validation Set Performance
- **Overall Metrics**:
  - Precision: 95.17%
  - Recall: 93.87%
  - F1 Score: 94.52%
  - Accuracy: 98.62%

#### Per-Label Performance
| Entity Type | Precision | Recall | F1 Score |
|-------------|-----------|--------|----------|
| LOC         | 96.46%    | 96.51% | 96.49%   |
| MISC        | 90.78%    | 89.14% | 89.95%   |
| ORG         | 92.61%    | 90.26% | 91.42%   |
| PER         | 97.94%    | 96.32% | 97.12%   |

### Test Set Performance
- **Overall Metrics**:
  - Precision: 91.82%
  - Recall: 89.68%
  - F1 Score: 90.74%
  - Accuracy: 97.23%

#### Per-Label Performance
| Entity Type | Precision | Recall | F1 Score |
|-------------|-----------|--------|----------|
| LOC         | 92.99%    | 92.10% | 92.54%   |
| MISC        | 82.05%    | 75.00% | 78.37%   |
| ORG         | 90.67%    | 88.28% | 89.46%   |
| PER         | 96.04%    | 95.57% | 95.81%   |

---

## How to Use the Model

You can load the model directly from the Hugging Face Model Hub:

```python
from transformers import pipeline

# Replace with your specific model checkpoint
model_checkpoint = "Prikshit7766/bert-finetuned-ner-accelerate"
token_classifier = pipeline(
    "token-classification", 
    model=model_checkpoint, 
    aggregation_strategy="simple"
)

# Example usage
result = token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn.")
print(result)
```

### Example Output
```python
[
   {
      "entity_group": "PER",
      "score": 0.9999658,
      "word": "Sylvain",
      "start": 11,
      "end": 18
   },
   {
      "entity_group": "ORG",
      "score": 0.99996203,
      "word": "Hugging Face",
      "start": 33,
      "end": 45
   },
   {
      "entity_group": "LOC",
      "score": 0.9999542,
      "word": "Brooklyn",
      "start": 49,
      "end": 57
   }
]
```

---