metadata

license: mit
base_model: FacebookAI/xlm-roberta-large
tags:
  - generated_from_trainer
datasets:
  - cnec
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: CNEC_xlm-roberta-large
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: cnec
          type: cnec
          config: default
          split: validation
          args: default
        metrics:
          - name: Precision
            type: precision
            value: 0.8566729323308271
          - name: Recall
            type: recall
            value: 0.9047146401985111
          - name: F1
            type: f1
            value: 0.8800386193579531
          - name: Accuracy
            type: accuracy
            value: 0.9771662763466042
language:
  - cs

CNEC_xlm-roberta-large

This model is a fine-tuned version of FacebookAI/xlm-roberta-large on the cnec dataset. It achieves the following results on the evaluation set:

Loss: 0.1471
Precision: 0.8567
Recall: 0.9047
F1: 0.8800
Accuracy: 0.9772

Model description

The entities are described as:

'O' = Outside of a named entity
'B-A' = Beginning of a complex address number (Postal code, street number, even phone number)
'I-A' = Inside of a number in the address
'B-G' = Beginning of a geographical name
'I-G' = Inside of a geographical name
'B-I' = Beginning of an institution name
'I-I' = Inside of an institution name
'B-M' = Beginning of a media name (email, server, website, tv series, etc.)
'I-M' = Inside of a media name
'B-O' = Beginning of an artifact name (book, old movies, etc.)
'I-O' = Inside of an artifact name
'B-P' = Beginning of a person's name
'I-P' = Inside of a person's name
'B-T' = Beginning of a time expression
'I-T' = Inside of a time expression

Intended uses & limitations

CNEC or Czech named entity corpus is a dataset aimed at the Czech language. This is an edited version of the dataset with only 7 supertypes and 1 type for non-entity.

Training and evaluation data

The model was trained with an increased dropout rate to 0.2 for hidden_dropout_prob and 0.15 for attention_probs_dropout_prob

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
weight_decay = 0.01
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.2836	1.12	500	0.1341	0.7486	0.8467	0.7946	0.9649
0.116	2.24	1000	0.1048	0.7866	0.8655	0.8242	0.9734
0.0832	3.36	1500	0.1066	0.7967	0.8734	0.8333	0.9746
0.0577	4.47	2000	0.1112	0.8408	0.8834	0.8616	0.9753
0.0445	5.59	2500	0.1378	0.8384	0.8883	0.8627	0.9751
0.0337	6.71	3000	0.1272	0.8505	0.8978	0.8735	0.9770
0.025	7.83	3500	0.1447	0.8462	0.9007	0.8726	0.9760
0.0191	8.95	4000	0.1471	0.8567	0.9047	0.8800	0.9772

Framework versions

Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0