pszemraj
/

long-t5-tglobal-xl-qmsum-wip

text2text-generation

Generated from Trainer

Model card Files Files and versions Community

long-t5-tglobal-xl-qmsum-wip / README.md

pszemraj's picture

Update README.md

7c2f532 over 1 year ago

|

history blame contribute delete

2.06 kB

	---
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	license: apache-2.0
	datasets:
	- pszemraj/qmsum-cleaned
	language:
	- en
	pipeline_tag: summarization
	inference: false
	---

	# long-t5-tglobal-xl-qmsum-wip

	> ⚠️ warning - this is a work in progress ⚠️

	<a href="https://colab.research.google.com/gist/pszemraj/ea0ac20dae4ad84bea4ea64543f84a85/long-t5-tglobal-xl-qmsum-wip.ipynb">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>

	This model is a fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `pszemraj/qmsum-cleaned` dataset.
	- Refer to the [dataset card](https://huggingface.co/datasets/pszemraj/qmsum-cleaned) for details but this model was trained with the task/prompt prefixes at the start of `input` which means that inference should be run in a similar fashion.
	- an example of how to run inference is in the Colab notebook linked above.

	It achieves the following results on the evaluation set:
	- Loss: 2.0505
	- Rouge1: 35.3881
	- Rouge2: 11.509
	- Rougel: 23.1543
	- Rougelsum: 31.3295
	- Gen Len: 80.8

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 7e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 2526
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.03
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| 1.5376 \| 1.0 \| 99 \| 2.0104 \| 35.8802 \| 11.4595 \| 23.6656 \| 31.49 \| 77.77 \|
	\| 1.499 \| 2.0 \| 198 \| 2.0358 \| 35.1265 \| 11.549 \| 23.1062 \| 30.8815 \| 88.88 \|
	\| 1.5034 \| 3.0 \| 297 \| 2.0505 \| 35.3881 \| 11.509 \| 23.1543 \| 31.3295 \| 80.8 \|