|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- conll2003 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# bert-base-cased-finetuned-conll2003-ner-v2 |
|
|
|
BERT ("bert-base-cased") finetuned on CoNLL-2003 (Conference on Computational Natural Language Learning). |
|
|
|
The model performs named entity recognition (NER). It pertains to section 2 of chapter 7 of the Hugging Face "NLP Course" (https://huggingface.co/learn/nlp-course/chapter7/2). |
|
|
|
It was trained using a custom PyTorch loop with Hugging Face Accelerate. |
|
|
|
Code: https://github.com/sambitmukherjee/huggingface-notebooks/blob/main/course/en/chapter7/section2_pt.ipynb |
|
|
|
Experiment tracking: https://wandb.ai/sadhaklal/bert-base-cased-finetuned-conll2003-ner-v2 |
|
|
|
## Usage |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
model_checkpoint = "sadhaklal/bert-base-cased-finetuned-conll2003-ner-v2" |
|
token_classifier = pipeline("token-classification", model=model_checkpoint, aggregation_strategy="simple") |
|
|
|
print(token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn.")) |
|
``` |
|
|
|
## Dataset |
|
|
|
From the dataset page: |
|
|
|
> The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. |
|
|
|
Examples: https://huggingface.co/datasets/conll2003/viewer |
|
|
|
## Metrics |
|
|
|
Accuracy on the 'validation' split of CoNLL-2003: 0.9858 |
|
|
|
Precision on the 'validation' split of CoNLL-2003: 0.9243 |
|
|
|
Recall on the 'validation' split of CoNLL-2003: 0.947 |
|
|
|
F1 on the 'validation' split of CoNLL-2003: 0.9355 |
|
|