bert-base-finnish-cased-v1 for QA

This is the bert-base-finnish-cased-v1 model, fine-tuned using an automatically translated Finnish version of the SQuAD2.0 dataset in combination with the Finnish partition of the TyDi-QA dataset. It's been trained on question-answer pairs, excluding unanswerable questions, for the task of question answering.

Another QA model that has been fine-tuned with also unanswerable questions is also available: bert-base-finnish-cased-squad2-fi.

Overview

Language model: bert-base-finnish-cased-v1
Language: Finnish Downstream-task: Extractive QA
Training data: Answerable questions from Finnish SQuAD 2.0 + Finnish partition of TyDi-QA Eval data: Answerable questions from Finnish SQuAD 2.0 + Finnish partition of TyDi-QA

Usage

In Transformers

from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline

model_name = "ilmariky/bert-base-finnish-cased-squad1-fi"

# a) Get predictions
nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)
QA_input = {
    'question': 'Mikä tämä on?',
    'context': 'Tämä on testi.'
}
res = nlp(QA_input)

# b) Load model & tokenizer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Performance

Evaluated with a slightly modified version of the official eval script.

{
    "exact": 58.00497718788884,
    "f1": 69.90891092523077,
    "total": 4822,
    "HasAns_exact": 58.00497718788884,
    "HasAns_f1": 69.90891092523077,
    "HasAns_total": 4822
}