File size: 4,022 Bytes
73514b9
 
 
 
 
53a7edb
 
73514b9
 
 
1d56e01
73514b9
 
 
53a7edb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73514b9
 
 
 
 
 
 
53a7edb
73514b9
53a7edb
 
 
 
 
73514b9
 
 
eb9f1af
 
73514b9
cd9ab95
73514b9
cd9ab95
 
73514b9
cd9ab95
 
73514b9
 
cd9ab95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73514b9
 
 
 
 
f774c9f
 
73514b9
 
 
53a7edb
73514b9
 
 
1d56e01
 
53a7edb
 
 
 
 
 
 
 
 
 
73514b9
 
 
 
53a7edb
 
73514b9
420caa0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
license: apache-2.0
base_model: bert-base-cased
tags:
- generated_from_trainer
datasets:
- conll2002
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: bert-finetuned-ner
  results:
  - task:
      name: Token Classification
      type: token-classification
    dataset:
      name: conll2002
      type: conll2002
      config: es
      split: validation
      args: es
    metrics:
    - name: Precision
      type: precision
      value: 0.7640546993705232
    - name: Recall
      type: recall
      value: 0.8088235294117647
    - name: F1
      type: f1
      value: 0.7858019868288871
    - name: Accuracy
      type: accuracy
      value: 0.9676902769959431
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# bert-finetuned-ner

This model is a fine-tuned version of [bert-base-cased](https://huggingface.co/bert-base-cased) on the conll2002 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.1912
- Precision: 0.7641
- Recall: 0.8088
- F1: 0.7858
- Accuracy: 0.9677

## Model description

El modelo base bert-base-cased es una versi贸n pre-entrenada del popular modelo de lenguaje BERT de Google. Inicialmente fue entrenado en grandes cantidades de texto para aprender representaciones densas de palabras y secuencias.
Posteriormente, este modelo toma la arquitectura y pesos pre-entrenados de bert-base-cased y los ajusta a煤n m谩s en la tarea espec铆fica de Reconocimiento de Entidades Nombradas (NER por sus siglas en ingl茅s) utilizando el conjunto de datos conll2002.

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("JoshuaAAX/bert-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("JoshuaAAX/bert-finetuned-ner")


text = "La Federaci贸n nacional de cafeteros de Colombia es una entidad del estado. El primer presidente el Dr Augusto Guerra cont贸 con el aval de la Asociaci贸n Colombiana de Aviaci贸n."


ner_pipeline= pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="max")
ner_pipeline(text) 
``` 

## Training data

| Abbreviation  | Description   | 
|:-------------:|:-------------:|
| O             | Outside of NE | 
| PER           | Person鈥檚 name |
| ORG           | Organization  | 
| LOC           | Location      |
| MISC          | Miscellaneous | 


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| 0.1713        | 1.0   | 521  | 0.1404          | 0.6859    | 0.7387 | 0.7114 | 0.9599   |
| 0.0761        | 2.0   | 1042 | 0.1404          | 0.6822    | 0.7693 | 0.7231 | 0.9623   |
| 0.05          | 3.0   | 1563 | 0.1304          | 0.7488    | 0.7937 | 0.7706 | 0.9672   |
| 0.0355        | 4.0   | 2084 | 0.1454          | 0.7585    | 0.7960 | 0.7768 | 0.9664   |
| 0.0253        | 5.0   | 2605 | 0.1501          | 0.7549    | 0.8095 | 0.7812 | 0.9677   |
| 0.0184        | 6.0   | 3126 | 0.1726          | 0.7581    | 0.7992 | 0.7781 | 0.9662   |
| 0.0138        | 7.0   | 3647 | 0.1743          | 0.7524    | 0.8042 | 0.7774 | 0.9676   |
| 0.0112        | 8.0   | 4168 | 0.1853          | 0.7576    | 0.8022 | 0.7792 | 0.9674   |
| 0.0082        | 9.0   | 4689 | 0.1914          | 0.7595    | 0.8061 | 0.7821 | 0.9667   |
| 0.0073        | 10.0  | 5210 | 0.1912          | 0.7641    | 0.8088 | 0.7858 | 0.9677   |


### Framework versions

- Transformers 4.41.0
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1