stulcrad's picture
Update README.md
b59370c verified
metadata
license: mit
base_model: FacebookAI/xlm-roberta-large
tags:
  - generated_from_trainer
datasets:
  - cnec
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: CNEC_xlm-roberta-large
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: cnec
          type: cnec
          config: default
          split: validation
          args: default
        metrics:
          - name: Precision
            type: precision
            value: 0.8566729323308271
          - name: Recall
            type: recall
            value: 0.9047146401985111
          - name: F1
            type: f1
            value: 0.8800386193579531
          - name: Accuracy
            type: accuracy
            value: 0.9771662763466042
language:
  - cs

CNEC_xlm-roberta-large

This model is a fine-tuned version of FacebookAI/xlm-roberta-large on the cnec dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1471
  • Precision: 0.8567
  • Recall: 0.9047
  • F1: 0.8800
  • Accuracy: 0.9772

Model description

The entities are described as:

  • 'O' = Outside of a named entity
  • 'B-A' = Beginning of a complex address number (Postal code, street number, even phone number)
  • 'I-A' = Inside of a number in the address
  • 'B-G' = Beginning of a geographical name
  • 'I-G' = Inside of a geographical name
  • 'B-I' = Beginning of an institution name
  • 'I-I' = Inside of an institution name
  • 'B-M' = Beginning of a media name (email, server, website, tv series, etc.)
  • 'I-M' = Inside of a media name
  • 'B-O' = Beginning of an artifact name (book, old movies, etc.)
  • 'I-O' = Inside of an artifact name
  • 'B-P' = Beginning of a person's name
  • 'I-P' = Inside of a person's name
  • 'B-T' = Beginning of a time expression
  • 'I-T' = Inside of a time expression

Intended uses & limitations

CNEC or Czech named entity corpus is a dataset aimed at the Czech language. This is an edited version of the dataset with only 7 supertypes and 1 type for non-entity.

Training and evaluation data

The model was trained with an increased dropout rate to 0.2 for hidden_dropout_prob and 0.15 for attention_probs_dropout_prob

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • weight_decay = 0.01
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.2836 1.12 500 0.1341 0.7486 0.8467 0.7946 0.9649
0.116 2.24 1000 0.1048 0.7866 0.8655 0.8242 0.9734
0.0832 3.36 1500 0.1066 0.7967 0.8734 0.8333 0.9746
0.0577 4.47 2000 0.1112 0.8408 0.8834 0.8616 0.9753
0.0445 5.59 2500 0.1378 0.8384 0.8883 0.8627 0.9751
0.0337 6.71 3000 0.1272 0.8505 0.8978 0.8735 0.9770
0.025 7.83 3500 0.1447 0.8462 0.9007 0.8726 0.9760
0.0191 8.95 4000 0.1471 0.8567 0.9047 0.8800 0.9772

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0