metadata

tags:
  - generated_from_trainer
metrics:
  - rouge
license: apache-2.0
datasets:
  - pszemraj/qmsum-cleaned
language:
  - en
pipeline_tag: summarization
inference: false

long-t5-tglobal-xl-qmsum-wip

⚠️ warning - this is a work in progress ⚠️

This model is a fine-tuned version of google/long-t5-tglobal-xl on the pszemraj/qmsum-cleaned dataset.

Refer to the dataset card for details but this model was trained with the task/prompt prefixes at the start of input which means that inference should be run in a similar fashion.
an example of how to run inference is in the Colab notebook linked above.

It achieves the following results on the evaluation set:

Loss: 2.0505
Rouge1: 35.3881
Rouge2: 11.509
Rougel: 23.1543
Rougelsum: 31.3295
Gen Len: 80.8

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 7e-05
train_batch_size: 1
eval_batch_size: 1
seed: 2526
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.5376	1.0	99	2.0104	35.8802	11.4595	23.6656	31.49	77.77
1.499	2.0	198	2.0358	35.1265	11.549	23.1062	30.8815	88.88
1.5034	3.0	297	2.0505	35.3881	11.509	23.1543	31.3295	80.8