davanstrien's picture
davanstrien HF staff
End of training
cb798e4
metadata
language:
  - de
license: mit
library_name: span-marker
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
datasets:
  - wikiann
metrics:
  - precision
  - recall
  - f1
widget:
  - text: >-
      Weitere Zulassungen folgten für Victoria und New South Wales 1975 und 1982
      am High Court of Australia.
  - text: >-
      Ihr Name geht auf die Bethlehemskapelle in Prag zurück, die für die
      Böhmischen Brüder eine wichtige Rolle spielt.
  - text: Sein Bundesliga-Debüt gab der Angreifer am 23.
  - text: >-
      Er qualifizierte sich für die Teilnahme an den Olympischen Spielen 2008 in
      Peking und erreichte dort über 200 m die Viertelfinalrunde.
  - text: Damit trat sie die Nachfolge des Sozialdemokraten Jens Stoltenberg an.
pipeline_tag: token-classification
base_model: numind/generic-entity_recognition_NER-multilingual-v1
model-index:
  - name: >-
      SpanMarker with numind/generic-entity_recognition_NER-multilingual-v1 on
      wikiann
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: Unknown
          type: wikiann
          split: eval
        metrics:
          - type: f1
            value: 0.9069700043471961
            name: F1
          - type: precision
            value: 0.9069700043471961
            name: Precision
          - type: recall
            value: 0.9069700043471961
            name: Recall

SpanMarker with numind/generic-entity_recognition_NER-multilingual-v1 on wikiann

This is a SpanMarker model trained on the wikiann dataset that can be used for Named Entity Recognition. This SpanMarker model uses numind/generic-entity_recognition_NER-multilingual-v1 as the underlying encoder.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
LOC "Savoyer Voralpen", "Bagan", "Zechin"
ORG "NHL Entry Draft", "SKA Sankt Petersburg", "Minnesota Wild"
PER "Antonina Wladimirowna Kriwoschapka", "Lou Salomé", "Jaan Kirsipuu"

Evaluation

Metrics

Label Precision Recall F1
all 0.9070 0.9070 0.9070
LOC 0.9036 0.9298 0.9165
ORG 0.8638 0.8446 0.8541
PER 0.9507 0.9405 0.9455

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")
# Run inference
entities = model.predict("Sein Bundesliga-Debüt gab der Angreifer am 23.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand
from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("span_marker_model_id")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

Training Details

Training Set Metrics

Training set Min Median Max
Sentence length 1 9.7693 85
Entities per sentence 1 1.3821 20

Training Hyperparameters

  • learning_rate: 5e-05
  • train_batch_size: 64
  • eval_batch_size: 128
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training Results

Epoch Step Validation Loss Validation Precision Validation Recall Validation F1 Validation Accuracy
1.2658 200 0.0172 0.8842 0.8534 0.8686 0.9586
2.5316 400 0.0145 0.8977 0.8889 0.8933 0.9670
3.7975 600 0.0161 0.8962 0.9006 0.8984 0.9688
5.0633 800 0.0180 0.8982 0.8996 0.8989 0.9689
6.3291 1000 0.0201 0.9014 0.9008 0.9011 0.9694
7.5949 1200 0.0201 0.9010 0.9057 0.9033 0.9702
8.8608 1400 0.0217 0.9062 0.9036 0.9049 0.9702

Framework Versions

  • Python: 3.10.12
  • SpanMarker: 1.5.0
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu118
  • Datasets: 2.15.0
  • Tokenizers: 0.15.0

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}