bert-finetuned-ner / README.md
JoshuaAAX's picture
Update README.md
eb9f1af verified
metadata
license: apache-2.0
base_model: bert-base-cased
tags:
  - generated_from_trainer
datasets:
  - conll2002
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: bert-finetuned-ner
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: conll2002
          type: conll2002
          config: es
          split: validation
          args: es
        metrics:
          - name: Precision
            type: precision
            value: 0.7640546993705232
          - name: Recall
            type: recall
            value: 0.8088235294117647
          - name: F1
            type: f1
            value: 0.7858019868288871
          - name: Accuracy
            type: accuracy
            value: 0.9676902769959431

bert-finetuned-ner

This model is a fine-tuned version of bert-base-cased on the conll2002 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1912
  • Precision: 0.7641
  • Recall: 0.8088
  • F1: 0.7858
  • Accuracy: 0.9677

Model description

El modelo base bert-base-cased es una versi贸n pre-entrenada del popular modelo de lenguaje BERT de Google. Inicialmente fue entrenado en grandes cantidades de texto para aprender representaciones densas de palabras y secuencias. Posteriormente, este modelo toma la arquitectura y pesos pre-entrenados de bert-base-cased y los ajusta a煤n m谩s en la tarea espec铆fica de Reconocimiento de Entidades Nombradas (NER por sus siglas en ingl茅s) utilizando el conjunto de datos conll2002.

How to Use

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("JoshuaAAX/bert-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("JoshuaAAX/bert-finetuned-ner")


text = "La Federaci贸n nacional de cafeteros de Colombia es una entidad del estado. El primer presidente el Dr Augusto Guerra cont贸 con el aval de la Asociaci贸n Colombiana de Aviaci贸n."


ner_pipeline= pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="max")
ner_pipeline(text) 

Training data

Abbreviation Description
O Outside of NE
PER Person鈥檚 name
ORG Organization
LOC Location
MISC Miscellaneous

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.1713 1.0 521 0.1404 0.6859 0.7387 0.7114 0.9599
0.0761 2.0 1042 0.1404 0.6822 0.7693 0.7231 0.9623
0.05 3.0 1563 0.1304 0.7488 0.7937 0.7706 0.9672
0.0355 4.0 2084 0.1454 0.7585 0.7960 0.7768 0.9664
0.0253 5.0 2605 0.1501 0.7549 0.8095 0.7812 0.9677
0.0184 6.0 3126 0.1726 0.7581 0.7992 0.7781 0.9662
0.0138 7.0 3647 0.1743 0.7524 0.8042 0.7774 0.9676
0.0112 8.0 4168 0.1853 0.7576 0.8022 0.7792 0.9674
0.0082 9.0 4689 0.1914 0.7595 0.8061 0.7821 0.9667
0.0073 10.0 5210 0.1912 0.7641 0.8088 0.7858 0.9677

Framework versions

  • Transformers 4.41.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1