Token Classification
Transformers
PyTorch
Safetensors
Finnish
bert
Inference Endpoints
Edit model card

Info

This is a fine-tuned model on the NER task. The original model is Turku NLP's bert-base-finnish-uncased-v1, and the fine-tuning dataset is Turku NLP's turku_ner_corpus.

The model is released under Apache 2.0.

Please mention the training dataset if you use this model:

@inproceedings{luoma-etal-2020-broad,
    title = "A Broad-coverage Corpus for {F}innish Named Entity Recognition",
    author = {Luoma, Jouni and Oinonen, Miika and Pyyk{\"o}nen, Maria and Laippala, Veronika and Pyysalo, Sampo},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    year = "2020",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.567",
    pages = "4615--4624",
}

Validation Metrics

  • Loss: 0.075
  • Accuracy: 0.982
  • Precision: 0.879
  • Recall: 0.868
  • F1: 0.873

Test Metrics

Overall Metrics

  • Accuracy: 0.986
  • Precision: 0.857
  • Recall: 0.872
  • F1: 0.864

Per-entity metrics

{
    "DATE": {
        "precision": 0.925,
        "recall": 0.9736842105263158,
        "f1": 0.9487179487179489,
        "number": "114"
    },
    "EVENT": {
        "precision": 0.3,
        "recall": 0.42857142857142855,
        "f1": 0.3529411764705882,
        "number": "7"
    },
    "LOC": {
        "precision": 0.9057239057239057,
        "recall": 0.9372822299651568,
        "f1": 0.9212328767123287,
        "number": "287"
    },
    "ORG": {
        "precision": 0.8274111675126904,
        "recall": 0.7836538461538461,
        "f1": 0.8049382716049382,
        "number": "208"
    },
    "PER": {
        "precision": 0.88,
        "recall": 0.9225806451612903,
        "f1": 0.9007874015748031,
        "number": "310"
    },
    "PRO": {
        "precision": 0.6081081081081081,
        "recall": 0.569620253164557,
        "f1": 0.5882352941176471,
        "number": "79"
    }
}

Usage

You can use cURL to access this model:

$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "Asun Brysselissä, Euroopan pääkaupungissa."}' https://api-inference.huggingface.co/models/iguanodon-ai/bert-base-finnish-uncased-ner

Or Python API:

from transformers import AutoModelForTokenClassification, AutoTokenizer

model = AutoModelForTokenClassification.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")
tokenizer = AutoTokenizer.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")

inputs = tokenizer("Asun Brysselissä, Euroopan pääkaupungissa.", return_tensors="pt")
outputs = model(**inputs)
Downloads last month
462
Safetensors
Model size
124M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train iguanodon-ai/bert-base-finnish-uncased-ner