metadata

language: es
license: gpl-3.0
tags:
  - PyTorch
  - Transformers
  - Token Classification
  - roberta
  - roberta-base-bne
widget:
  - text: Fue antes de llegar a Sigüeiro, en el Camino de Santiago.
  - text: Si te metes en el Franco desde la Alameda, vas hacia la Catedral.
  - text: Y allí precisamente es Santiago el patrón del pueblo.
model-index:
  - name: es_trf_ner_cds_bne-base
    results: []

Introduction

This model is a fine-tuned version of roberta-base-bne for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).

Usage

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("roberta-bne-ner-cds")
model = AutoModelForTokenClassification.from_pretrained("roberta-bne-ner-cds")

example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. El proyecto lo financia el Ministerio de Industria y Competitividad."
ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")

for ent in ner_pipe(example):
    print(ent)

Dataset

ToDo

Model performance

entity	precision	recall	f1
LOC	0.986	0.982	0.984
MISC	0.800	0.911	0.852
ORG	0.896	0.779	0.833
PER	0.953	0.937	0.945
micro avg	0.967	0.971	0.969
macro avg	0.909	0.902	0.903
weighted avg	0.968	0.971	0.969

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3.0

Framework versions

Transformers 4.28.1
Pytorch 2.0.1+cu117
Datasets 2.12.0
Tokenizers 0.13.3