mT5-base finetuned on the GermanQuAD dataset for answer-agnostic question generation
This model is a finetuned mT5-base model for the task of answer-agnostic (or end-to-end) question generation. The approach from Lopez et al. was used called All questions per line (AQPL). This means a paragraph is provided as input and multiple questions are generated from it. Other models already tested this approach with the T5 model for English and German.
For finetuning this model only used the GermanQuAD dataset from deepset was used. The dataset was modified and filtered with scripts that can be found in another repository.
Training, test and evaluation data
For training and test the original split from GermanQuAD was used. As evaluation dataset the German split of the XQuAD dataset was used.
Training hyperparameters
The training parameters are provided in JSON and can be used with a training script provided in a repository
{
"model_name_or_path": "google/mt5-base",
"output_dir": "mt5-base-germanquad-e2e-qg",
"overwrite_output_dir": true,
"cache_dir": "model-cache",
"dataset_dir": "e2e-qg-germanquad",
"preprocessing_num_workers": 20,
"max_source_length": 1024,
"max_target_length": 128,
"val_max_target_length": 128,
"pad_to_max_length": true,
"seed": 42,
"do_train": true,
"gradient_accumulation_steps": 64,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"learning_rate": 1e-4,
"num_train_epochs": 10,
"evaluation_strategy": "epoch",
"logging_strategy": "epoch",
"save_strategy": "epoch",
"save_total_limit": 3,
"dataloader_num_workers": 8,
"ddp_find_unused_parameters": false
}
Training results
The evaluation is reported on XQuAD. The implementations and configurations can be found in another repository.
- Downloads last month
- 10
Dataset used to train tilomichel/mT5-base-GermanQuAD-e2e-qg
Evaluation results
- BLEU Score on XQuAD (de)self-reported1.728
- BLEU-1 on XQuAD (de)self-reported49.211
- BLEU-2 on XQuAD (de)self-reported16.960
- BLEU-3 on XQuAD (de)self-reported7.145
- BLEU-4 on XQuAD (de)self-reported3.230
- ROUGE-L (f-measure) on XQuAD (de)self-reported0.171
- METEOR on XQuAD (de)self-reported0.084
- BERTScore (F1) on XQuAD (de)self-reported0.332