✨ NeoBERT for NER

This repository hosts an NeoBERT model that was fine-tuned on the CoNLL-2003 NER dataset.

Please notice the following caveats:

  • ⚠️ Work in progress, as e.g. new hyper-parameter changes or bug fixes for the implemented NeoBERTForTokenClassification class can occur.
  • ⚠️ At the moment, don't expect BERT-like performance, more experiments are needed. (Is RoPE causing this?)

πŸ“ Implementation

An own NeoBERTForTokenClassification class was implemented to conduct experiments with Transformers.

For all experiments, Transformers in version 4.50.0.dev0 is currently used including a recent built of xFormers, as NeoBERT depends on that for the SwiGLU implementation.

For following code (based on the PyTorch Token Classification example can be used for fine-tuning:

python3 run_ner.py \
  --model_name_or_path /home/stefan/Repositories/NeoBERT \
  --dataset_name conll2003 \
  --output_dir ./neobert-conll2003-lr1e-05-e10-bs16-1 \
  --seed 1 \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 16 \
  --num_train_epochs 10 \
  --learning_rate 1e-05 \
  --eval_strategy epoch \
  --save_strategy epoch \
  --overwrite_output_dir \
  --trust_remote_code True \
  --load_best_model_at_end \
  --metric_for_best_model "eval_f1" \
  --greater_is_better True

πŸ“Š Performance

A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:

Configuration Run 1 Run 2 Run 3 Run 4 Run 5 Avg.
bs=16,e=10,lr=1e-05 95.71 95.42 95.53 95.56 95.43 95.53
bs=16,e=10,lr=2e-05 95.25 95.33 95.28 95.35 95.26 95.29
bs=16,e=10,lr=3e-05 94.98 95.22 94.86 94.72 94.93 94.94
bs=16,e=10,lr=4e-05 94.61 94.39 94.57 94.65 94.87 94.61
bs=16,e=10,lr=5e-05 93.82 93.94 94.36 91.14 94.38 94.15

The performance of the current uploaded model is marked in bold.

πŸ“£ Usage

The following code can be used to test the model and recognize named entities for a given sentence:

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

model_name = "stefan-it/neobert-ner-conll03"


model = AutoModelForTokenClassification.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

ner = pipeline(task="ner",
               model=model,
               tokenizer=tokenizer,
               trust_remote_code=True)

print(ner("George Washington went to Washington in the US."))

This outputs:

[
 {'entity': 'B-PER', 'score': 0.99981505, 'index': 1, 'word': 'george', 'start': 0, 'end': 6},
 {'entity': 'I-PER', 'score': 0.9997435, 'index': 2, 'word': 'washington', 'start': 7, 'end': 17},
 {'entity': 'B-LOC', 'score': 0.99955124, 'index': 5, 'word': 'washington', 'start': 26, 'end': 36},
 {'entity': 'B-LOC', 'score': 0.99958867, 'index': 8, 'word': 'us', 'start': 44, 'end': 46}
]
Downloads last month
28
Safetensors
Model size
222M params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for stefan-it/neobert-ner-conll03

Finetuned
(2)
this model

Dataset used to train stefan-it/neobert-ner-conll03