Marvin

Initial commit

cb0ed1f unverified about 1 year ago

10.3 kB

	---
	language:
	- de
	tags:
	- question-generation
	- german
	- text2text-generation
	- generated_from_trainer
	datasets:
	- lmqg/qg_dequad
	metrics:
	- bleu4
	- f1
	- rouge
	- exact_match
	model-index:
	- name: german-jeopardy-longt5-large-256
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: lmqg/qg_dequad
	type: default
	args: default
	metrics:
	- name: BLEU-4
	type: bleu4
	value: 4.87
	- name: F1
	type: f1
	value: 23.82
	- name: ROUGE-1
	type: rouge1
	value: 23.88
	- name: ROUGE-2
	type: rouge2
	value: 8.54
	- name: ROUGE-L
	type: rougel
	value: 23.14
	- name: ROUGE-Lsum
	type: rougelsum
	value: 23.13
	- name: Exact Match
	type: exact_match
	value: 0.32
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# german-jeopardy-longt5-large-256

	This model is a fine-tuned version of [google/long-t5-tglobal-large](https://huggingface.co/google/long-t5-tglobal-large) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.8541
	- Brevity Penalty: 0.8795
	- System Length: 18427
	- Reference Length: 20793
	- ROUGE-1: 23.88
	- ROUGE-2: 8.54
	- ROUGE-L: 23.14
	- ROUGE-Lsum: 23.13
	- Exact Match: 0.32
	- BLEU: 4.87
	- F1: 23.82

	## Model description

	See [google/long-t5-tglobal-large](https://huggingface.co/google/long-t5-tglobal-large) for more information about the
	model architecture.
	The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

	## Intended uses & limitations

	This model can be used for question generation on German text.

	## Training and evaluation data

	See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad).

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 7
	- gradient_accumulation_steps: 128
	- total_train_batch_size: 256
	- optimizer: Adafactor
	- lr_scheduler_type: constant
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Counts 1 \| Counts 2 \| Counts 3 \| Counts 4 \| Totals 1 \| Totals 2 \| Totals 3 \| Totals 4 \| Precisions 1 \| Precisions 2 \| Precisions 3 \| Precisions 4 \| Brevity Penalty \| System Length \| Reference Length \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| ROUGE-Lsum \| Exact Match \| BLEU \| Mean Generated Length \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:------------:\|:------------:\|:------------:\|:------------:\|:---------------:\|:-------------:\|:----------------:\|:-------:\|:-------:\|:-------:\|:----------:\|:-----------:\|:------:\|:---------------------:\|:------:\|
	\| 8.8727 \| 0.99 \| 36 \| 6.3810 \| 2198 \| 0 \| 0 \| 0 \| 2204 \| 0 \| 0 \| 0 \| 99.7278 \| 0.0 \| 0.0 \| 0.0 \| 0.0002 \| 2204 \| 21250 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 2.0 \| 0.0 \|
	\| 6.0165 \| 1.98 \| 72 \| 5.3864 \| 3587 \| 137 \| 0 \| 0 \| 21960 \| 19756 \| 17552 \| 15348 \| 16.3342 \| 0.6935 \| 0.0028 \| 0.0016 \| 1.0 \| 21960 \| 21250 \| 0.0702 \| 0.0079 \| 0.07 \| 0.07 \| 0.0 \| 0.0851 \| 15.0091 \| 0.073 \|
	\| 5.1537 \| 3.0 \| 109 \| 4.9617 \| 3601 \| 145 \| 1 \| 0 \| 14449 \| 12245 \| 10041 \| 7837 \| 24.9221 \| 1.1842 \| 0.01 \| 0.0064 \| 0.6246 \| 14449 \| 21250 \| 0.0882 \| 0.0107 \| 0.0877 \| 0.0876 \| 0.0 \| 0.13 \| 9.5309 \| 0.0926 \|
	\| 4.863 \| 3.99 \| 145 \| 4.5531 \| 4590 \| 229 \| 19 \| 0 \| 41674 \| 39470 \| 37266 \| 35062 \| 11.0141 \| 0.5802 \| 0.051 \| 0.0014 \| 1.0 \| 41674 \| 21250 \| 0.0811 \| 0.0081 \| 0.0768 \| 0.0767 \| 0.0 \| 0.1468 \| 29.4528 \| 0.0836 \|
	\| 4.5201 \| 4.97 \| 181 \| 4.2020 \| 3643 \| 169 \| 19 \| 0 \| 16104 \| 13900 \| 11696 \| 9492 \| 22.6217 \| 1.2158 \| 0.1624 \| 0.0053 \| 0.7265 \| 16104 \| 21250 \| 0.0865 \| 0.0115 \| 0.0856 \| 0.0855 \| 0.0 \| 0.2845 \| 12.5077 \| 0.0907 \|
	\| 4.1347 \| 5.99 \| 218 \| 3.9353 \| 3670 \| 167 \| 20 \| 0 \| 16796 \| 14592 \| 12388 \| 10184 \| 21.8504 \| 1.1445 \| 0.1614 \| 0.0049 \| 0.7671 \| 16796 \| 21250 \| 0.087 \| 0.0114 \| 0.0859 \| 0.0858 \| 0.0 \| 0.2878 \| 13.1656 \| 0.0917 \|
	\| 4.012 \| 6.98 \| 254 \| 3.7593 \| 3780 \| 198 \| 35 \| 1 \| 16582 \| 14378 \| 12174 \| 9970 \| 22.7958 \| 1.3771 \| 0.2875 \| 0.01 \| 0.7546 \| 16582 \| 21250 \| 0.0916 \| 0.0128 \| 0.0903 \| 0.0902 \| 0.0 \| 0.4139 \| 12.2931 \| 0.0968 \|
	\| 3.7048 \| 8.0 \| 291 \| 3.6034 \| 3668 \| 205 \| 36 \| 3 \| 16158 \| 13954 \| 11750 \| 9546 \| 22.7008 \| 1.4691 \| 0.3064 \| 0.0314 \| 0.7297 \| 16158 \| 21250 \| 0.0882 \| 0.0134 \| 0.0873 \| 0.0872 \| 0.0 \| 0.5493 \| 11.7568 \| 0.0923 \|
	\| 3.6284 \| 8.99 \| 327 \| 3.4567 \| 4070 \| 527 \| 160 \| 28 \| 17459 \| 15255 \| 13051 \| 10847 \| 23.3118 \| 3.4546 \| 1.226 \| 0.2581 \| 0.8048 \| 17459 \| 21250 \| 0.1109 \| 0.0281 \| 0.1083 \| 0.1082 \| 0.0 \| 1.8083 \| 9.7777 \| 0.1152 \|
	\| 3.4605 \| 9.98 \| 363 \| 3.3390 \| 4325 \| 512 \| 128 \| 27 \| 18829 \| 16625 \| 14421 \| 12217 \| 22.9699 \| 3.0797 \| 0.8876 \| 0.221 \| 0.8793 \| 18829 \| 21250 \| 0.1206 \| 0.0288 \| 0.1168 \| 0.1167 \| 0.0 \| 1.6972 \| 12.6729 \| 0.1254 \|
	\| 3.2267 \| 10.99 \| 400 \| 3.1995 \| 4498 \| 774 \| 237 \| 49 \| 18802 \| 16598 \| 14394 \| 12190 \| 23.923 \| 4.6632 \| 1.6465 \| 0.402 \| 0.8779 \| 18802 \| 21250 \| 0.1348 \| 0.0405 \| 0.132 \| 0.1319 \| 0.0005 \| 2.5735 \| 11.5009 \| 0.1381 \|
	\| 3.1761 \| 11.98 \| 436 \| 3.1165 \| 4578 \| 866 \| 260 \| 50 \| 16963 \| 14759 \| 12555 \| 10351 \| 26.9882 \| 5.8676 \| 2.0709 \| 0.483 \| 0.7767 \| 16963 \| 21250 \| 0.1454 \| 0.0464 \| 0.1426 \| 0.1427 \| 0.0005 \| 2.7554 \| 10.5172 \| 0.1492 \|
	\| 3.0323 \| 12.97 \| 472 \| 3.0074 \| 5019 \| 1048 \| 319 \| 59 \| 18077 \| 15873 \| 13669 \| 11465 \| 27.7646 \| 6.6024 \| 2.3337 \| 0.5146 \| 0.839 \| 18077 \| 21250 \| 0.1691 \| 0.0557 \| 0.1648 \| 0.1647 \| 0.0009 \| 3.2318 \| 12.8294 \| 0.1729 \|
	\| 2.8223 \| 13.99 \| 509 \| 2.8911 \| 5257 \| 1120 \| 341 \| 85 \| 17074 \| 14870 \| 12666 \| 10462 \| 30.7895 \| 7.5319 \| 2.6922 \| 0.8125 \| 0.783 \| 17074 \| 21250 \| 0.189 \| 0.0635 \| 0.1841 \| 0.184 \| 0.0018 \| 3.7161 \| 12.6824 \| 0.1929 \|
	\| 2.7732 \| 14.98 \| 545 \| 2.8103 \| 5616 \| 1271 \| 407 \| 113 \| 17784 \| 15580 \| 13376 \| 11172 \| 31.5789 \| 8.1579 \| 3.0428 \| 1.0115 \| 0.8229 \| 17784 \| 21250 \| 0.2122 \| 0.0731 \| 0.2063 \| 0.2061 \| 0.0045 \| 4.3667 \| 13.0944 \| 0.217 \|
	\| 2.58 \| 16.0 \| 582 \| 2.7183 \| 5959 \| 1461 \| 510 \| 171 \| 18808 \| 16604 \| 14400 \| 12196 \| 31.6833 \| 8.7991 \| 3.5417 \| 1.4021 \| 0.8782 \| 18808 \| 21250 \| 0.2286 \| 0.0822 \| 0.2214 \| 0.2212 \| 0.0064 \| 5.357 \| 13.9174 \| 0.2316 \|
	\| 2.5368 \| 16.99 \| 618 \| 2.6630 \| 5935 \| 1543 \| 576 \| 201 \| 16923 \| 14719 \| 12515 \| 10311 \| 35.0706 \| 10.483 \| 4.6025 \| 1.9494 \| 0.7744 \| 16923 \| 21250 \| 0.2365 \| 0.089 \| 0.2309 \| 0.2307 \| 0.0059 \| 5.8686 \| 12.3185 \| 0.2377 \|
	\| 2.4325 \| 17.98 \| 654 \| 2.5798 \| 6305 \| 1756 \| 685 \| 265 \| 17870 \| 15666 \| 13462 \| 11258 \| 35.2826 \| 11.209 \| 5.0884 \| 2.3539 \| 0.8277 \| 17870 \| 21250 \| 0.2518 \| 0.0982 \| 0.2452 \| 0.2452 \| 0.0059 \| 6.8664 \| 13.1688 \| 0.2537 \|
	\| 2.2632 \| 18.99 \| 691 \| 2.5155 \| 6577 \| 1888 \| 762 \| 304 \| 17785 \| 15581 \| 13377 \| 11173 \| 36.9806 \| 12.1173 \| 5.6963 \| 2.7208 \| 0.823 \| 17785 \| 21250 \| 0.2689 \| 0.1102 \| 0.261 \| 0.2611 \| 0.0086 \| 7.5129 \| 13.2373 \| 0.2702 \|
	\| 2.2026 \| 19.79 \| 720 \| 2.4997 \| 6644 \| 1853 \| 720 \| 273 \| 17658 \| 15454 \| 13250 \| 11046 \| 37.626 \| 11.9904 \| 5.434 \| 2.4715 \| 0.8159 \| 17658 \| 21250 \| 0.2717 \| 0.1097 \| 0.2628 \| 0.2625 \| 0.0073 \| 7.1987 \| 13.6343 \| 0.2742 \|


	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.1.0
	- Datasets 2.12.0
	- Tokenizers 0.13.3