german-jeopardy-longt5-large-1k-64-constant

This model is a fine-tuned version of google/long-t5-tglobal-large on the lmqg/qg_dequad dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5907
  • Brevity Penalty: 0.9367
  • System Length: 19517
  • Reference Length: 20793
  • ROUGE-1: 32.79
  • ROUGE-2: 14.95
  • ROUGE-L: 31.56
  • ROUGE-Lsum: 31.57
  • Exact Match: 1.36
  • BLEU: 9.50
  • F1: 32.03

Model description

See google/long-t5-tglobal-large for more information about the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

Intended uses & limitations

This model can be used for question generation on German text.

Training and evaluation data

See lmqg/qg_dequad.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 7
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Adafactor
  • lr_scheduler_type: constant
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Counts 1 Counts 2 Counts 3 Counts 4 Totals 1 Totals 2 Totals 3 Totals 4 Precisions 1 Precisions 2 Precisions 3 Precisions 4 Brevity Penalty System Length Reference Length ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum Exact Match BLEU Mean Generated Length F1
6.5987 1.0 145 5.0696 3804 134 2 0 22913 20709 18505 16301 16.6019 0.6471 0.0108 0.0031 1.0 22913 21250 0.0783 0.007 0.0769 0.0768 0.0 0.1374 16.2899 0.0814
4.7443 2.0 291 4.2270 4022 188 20 0 17366 15162 12958 10754 23.1602 1.2399 0.1543 0.0046 0.7996 17366 21250 0.1028 0.012 0.0991 0.099 0.0 0.303 12.9038 0.1073
4.1412 3.0 436 3.7838 3723 187 26 2 16515 14311 12107 9903 22.5431 1.3067 0.2148 0.0202 0.7507 16515 21250 0.0899 0.0124 0.0886 0.0884 0.0 0.4488 12.4769 0.0938
3.6791 4.0 582 3.4246 4576 549 134 26 21871 19667 17463 15259 20.9227 2.7915 0.7673 0.1704 1.0 21871 21250 0.1259 0.0296 0.1204 0.1201 0.0 1.6623 14.5676 0.1323
3.3523 5.0 727 3.1723 4900 796 210 41 19389 17185 14981 12777 25.2721 4.6319 1.4018 0.3209 0.9085 19389 21250 0.1542 0.0449 0.1486 0.1484 0.0005 2.4472 14.3943 0.1585
3.0161 6.0 873 2.9268 5633 1182 390 111 19045 16841 14637 12433 29.5773 7.0186 2.6645 0.8928 0.8907 19045 21250 0.204 0.069 0.196 0.1961 0.0045 4.1987 14.5789 0.2074
2.7639 7.0 1018 2.7601 6100 1461 499 165 17924 15720 13516 11312 34.0326 9.2939 3.6919 1.4586 0.8306 17924 21250 0.2409 0.0885 0.2332 0.2331 0.0073 5.3362 13.8553 0.2431
2.5036 8.0 1164 2.5729 6765 1845 701 273 20179 17975 15771 13567 33.525 10.2643 4.4449 2.0122 0.9483 20179 21250 0.2682 0.1079 0.2589 0.259 0.0059 7.0633 15.7232 0.2689
2.307 8.99 1309 2.4637 7018 2047 826 348 19054 16850 14646 12442 36.8322 12.1484 5.6398 2.797 0.8911 19054 21250 0.2907 0.1218 0.2799 0.2798 0.0095 8.1681 14.8076 0.2907
2.1012 10.0 1455 2.3614 7147 2127 883 389 18473 16269 14065 11861 38.6889 13.0739 6.278 3.2797 0.8604 18473 21250 0.3003 0.1275 0.289 0.2888 0.0118 8.6921 14.2736 0.3008
1.9538 10.99 1600 2.2980 7481 2339 997 459 18524 16320 14116 11912 40.3854 14.3321 7.0629 3.8533 0.8632 18524 21250 0.3192 0.1423 0.3064 0.3068 0.0127 9.67 14.3757 0.3167
1.7909 12.0 1746 2.2389 7675 2546 1144 546 18849 16645 14441 12237 40.7183 15.2959 7.9219 4.4619 0.8804 18849 21250 0.3299 0.1528 0.3174 0.3175 0.015 10.724 14.583 0.3279
1.6691 12.99 1891 2.1813 7858 2635 1179 576 18643 16439 14235 12031 42.1499 16.029 8.2824 4.7876 0.8695 18643 21250 0.344 0.1626 0.33 0.33 0.0163 11.1241 14.3848 0.3395
1.5361 14.0 2037 2.1546 8016 2729 1249 606 18754 16550 14346 12142 42.7429 16.4894 8.7063 4.9909 0.8754 18754 21250 0.3494 0.1664 0.3349 0.3351 0.0163 11.5803 14.564 0.3462
1.4365 14.99 2182 2.1358 8112 2839 1316 647 18390 16186 13982 11778 44.1109 17.5398 9.4121 5.4933 0.856 18390 21250 0.3581 0.1761 0.3448 0.3448 0.02 12.1055 14.1656 0.3538
1.3263 16.0 2328 2.1190 8381 2990 1430 731 18892 16688 14484 12280 44.3627 17.9171 9.873 5.9528 0.8827 18892 21250 0.3681 0.1831 0.3532 0.3534 0.0209 12.9765 14.5445 0.363
1.2329 17.0 2474 2.1202 8449 3101 1520 786 18612 16408 14204 12000 45.3954 18.8993 10.7012 6.55 0.8678 18612 21250 0.3743 0.1901 0.3603 0.3603 0.0227 13.5903 14.1779 0.3692
1.1557 18.0 2619 2.1282 8406 3154 1558 804 17958 15754 13550 11346 46.8092 20.0203 11.4982 7.0862 0.8325 17958 21250 0.3761 0.194 0.3633 0.3636 0.0277 13.8388 13.677 0.371
1.0658 19.0 2765 2.1232 8614 3241 1610 839 18955 16751 14547 12343 45.4445 19.3481 11.0676 6.7974 0.886 18955 21250 0.3803 0.196 0.3654 0.3656 0.0272 14.2084 14.3816 0.3749
0.9944 19.93 2900 2.1203 8658 3273 1625 859 18853 16649 14445 12241 45.9237 19.6588 11.2496 7.0174 0.8806 18853 21250 0.3833 0.1977 0.369 0.3691 0.0268 14.3883 14.2881 0.3775

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.1.0
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
3
Safetensors
Model size
783M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train GiantTreeG/german-jeopardy-longt5-large

Evaluation results