The used dataset raalst/squad_v2_dutch was kindly provided by Henryk Borzymowski. It is a translated version of SQuAD V2. I converted it from json to jsonl format. it contains train and validation splits, no test split. I declared 20% of Train to be used as Testset in my finetuning run. That testset is what the evaluation is based on.

when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts

The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model

results obtained in training are :

metric = load("evaluate-metric/squad_v2" if squad_v2 else "evaluate-metric/squad")

{'exact': 61.75389109958193,
 'f1': 66.89717170237417,
 'total': 19853,
 'HasAns_exact': 48.967182330322814,
 'HasAns_f1': 58.09796564493008,
 'HasAns_total': 11183,
 'NoAns_exact': 78.24682814302192,
 'NoAns_f1': 78.24682814302192,
 'NoAns_total': 8670,
 'best_exact': 61.75389109958193,
 'best_exact_thresh': 0.0,
 'best_f1': 66.89717170237276,
 'best_f1_thresh': 0.0}

This seems mediocre to me.

settings (until I figured out how to report them properly):

DatasetDict({
  train: Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 79412
})
test: Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 19853
})
validation: Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 9669
})
})

tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")

from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
training_args = TrainingArguments(
  output_dir="./qa_model",
  evaluation_strategy="epoch",
  learning_rate=2e-5,
  per_device_train_batch_size=16,
  per_device_eval_batch_size=16,
  num_train_epochs=3,
  weight_decay=0.01,
  push_to_hub=False,
)

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_squad["train"],
eval_dataset=tokenized_squad["validation"],
tokenizer=tokenizer,
data_collator=data_collator,
)

trainer.train()

[15198/15198 2:57:03, Epoch 3/3]
Epoch 	Training Loss 	Validation Loss
1 	1.380700 	1.177431
2 	1.093000 	1.052601
3 	0.849700 	1.143632

TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565, 
'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16, 
'train_loss': 1.1917077029499668, 'epoch': 3.0})

Trained on Ubuntu with 1080Ti

Downloads last month
16
Safetensors
Model size
116M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including raalst/RobBERT-v2-nl-qa