german-jeopardy-longt5-base-256

This model is a fine-tuned version of google/long-t5-tglobal-base on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7833
  • Brevity Penalty: 0.8244
  • System Length: 17427
  • Reference Length: 20793
  • ROUGE-1: 34.80
  • ROUGE-2: 16.54
  • ROUGE-L: 33.69
  • ROUGE-Lsum: 33.70
  • Exact Match: 1.50
  • BLEU: 10.52
  • F1: 33.92

Model description

See google/long-t5-tglobal-base for more information about the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 7
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Counts 1 Counts 2 Counts 3 Counts 4 Totals 1 Totals 2 Totals 3 Totals 4 Precisions 1 Precisions 2 Precisions 3 Precisions 4 Brevity Penalty System Length Reference Length ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum Exact Match BLEU Mean Generated Length F1
3.6024 0.99 36 2.4682 5645 1343 424 109 15388 13184 10980 8776 36.6844 10.1866 3.8616 1.242 0.6832 15388 21250 0.2285 0.0824 0.2192 0.2188 0.0005 4.4454 11.6338 0.2236
2.9671 1.98 72 2.2445 5988 1562 569 179 16094 13890 11686 9482 37.2064 11.2455 4.8691 1.8878 0.7259 16094 21250 0.2465 0.0971 0.2371 0.2371 0.0018 5.7163 12.314 0.2401
2.6324 2.99 109 2.1227 6539 1846 702 240 17173 14969 12765 10561 38.0772 12.3322 5.4994 2.2725 0.7887 17173 21250 0.2729 0.1154 0.2601 0.2604 0.0027 6.9028 13.2319 0.2663
2.5557 3.98 145 2.0357 6491 1923 752 275 15961 13757 11553 9349 40.6679 13.9783 6.5091 2.9415 0.7179 15961 21250 0.2783 0.1214 0.2676 0.2678 0.0059 7.3331 12.0962 0.2729
2.3785 5.0 182 1.9824 6808 2113 855 328 16439 14235 12031 9827 41.4137 14.8437 7.1066 3.3377 0.7463 16439 21250 0.2948 0.1326 0.2825 0.2825 0.0064 8.2007 12.6819 0.2892
2.3396 5.99 218 1.9449 7033 2194 886 364 16851 14647 12443 10239 41.7364 14.9792 7.1205 3.555 0.7702 16851 21250 0.3044 0.1373 0.292 0.2922 0.0086 8.639 13.0254 0.3
2.2557 6.98 254 1.8938 7167 2285 939 389 16529 14325 12121 9917 43.3602 15.9511 7.7469 3.9226 0.7515 16529 21250 0.3166 0.1428 0.3043 0.3046 0.0095 9.049 12.7119 0.3119
2.1168 7.99 291 1.8575 7347 2425 1021 425 16860 14656 12452 10248 43.5765 16.5461 8.1995 4.1472 0.7708 16860 21250 0.3258 0.1505 0.3137 0.3142 0.0104 9.6447 12.9374 0.3211
2.1105 8.98 327 1.8284 7460 2461 1061 449 17034 14830 12626 10422 43.7948 16.5947 8.4033 4.3082 0.7807 17034 21250 0.3317 0.1521 0.3187 0.3191 0.0095 9.9436 13.1828 0.3267
1.9913 10.0 364 1.8057 7547 2537 1105 487 17005 14801 12597 10393 44.3811 17.1407 8.7719 4.6858 0.7791 17005 21250 0.335 0.1566 0.323 0.3233 0.0113 10.3601 13.0358 0.3316
1.9943 10.99 400 1.7973 7629 2574 1131 496 16842 14638 12434 10230 45.2975 17.5844 9.096 4.8485 0.7697 16842 21250 0.343 0.1594 0.3296 0.33 0.0113 10.5378 13.0154 0.3385
1.941 11.98 436 1.7773 7681 2606 1164 528 17105 14901 12697 10493 44.905 17.4888 9.1675 5.0319 0.7848 17105 21250 0.3421 0.1607 0.3295 0.3294 0.0132 10.8273 13.1361 0.3385
1.8453 12.99 473 1.7595 7817 2700 1224 560 17324 15120 12916 10712 45.1224 17.8571 9.4766 5.2278 0.7972 17324 21250 0.3492 0.1662 0.3367 0.3367 0.0127 11.2687 13.5018 0.3447
1.85 13.98 509 1.7414 7792 2642 1182 537 17417 15213 13009 10805 44.7379 17.3667 9.086 4.9699 0.8025 17417 21250 0.3458 0.1632 0.3322 0.3322 0.0127 10.9825 13.5395 0.3416
1.7588 15.0 546 1.7346 7827 2702 1223 569 17265 15061 12857 10653 45.3345 17.9404 9.5123 5.3412 0.7939 17265 21250 0.3487 0.1661 0.3355 0.3354 0.015 11.3189 13.3026 0.3446
1.7663 15.99 582 1.7191 7946 2757 1245 581 17431 15227 13023 10819 45.5855 18.106 9.56 5.3702 0.8032 17431 21250 0.3544 0.1695 0.3418 0.3416 0.0154 11.5245 13.4515 0.3501
1.7317 16.98 618 1.7133 8068 2844 1325 633 17752 15548 13344 11140 45.4484 18.2917 9.9296 5.6822 0.8212 17752 21250 0.3575 0.1746 0.3445 0.3447 0.0163 12.0845 13.77 0.3527
1.6421 17.99 655 1.7198 8003 2823 1301 609 17535 15331 13127 10923 45.6401 18.4137 9.9109 5.5754 0.8091 17535 21250 0.3576 0.1737 0.3447 0.3448 0.015 11.877 13.4669 0.353
1.6543 18.98 691 1.7151 8031 2817 1294 612 17803 15599 13395 11191 45.1104 18.0588 9.6603 5.4687 0.824 17803 21250 0.3567 0.1734 0.3435 0.3431 0.015 11.8679 13.8648 0.351
1.5702 19.78 720 1.7079 7996 2850 1330 639 17275 15071 12867 10663 46.2865 18.9105 10.3365 5.9927 0.7945 17275 21250 0.3618 0.1769 0.3485 0.348 0.0168 12.1229 13.3367 0.3569

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.1.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
2
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train GiantTreeG/german-jeopardy-longt5-base-256

Evaluation results