pszemraj's picture
Update README.md
7c2f532
metadata
tags:
  - generated_from_trainer
metrics:
  - rouge
license: apache-2.0
datasets:
  - pszemraj/qmsum-cleaned
language:
  - en
pipeline_tag: summarization
inference: false

long-t5-tglobal-xl-qmsum-wip

⚠️ warning - this is a work in progress ⚠️

Open In Colab

This model is a fine-tuned version of google/long-t5-tglobal-xl on the pszemraj/qmsum-cleaned dataset.

  • Refer to the dataset card for details but this model was trained with the task/prompt prefixes at the start of input which means that inference should be run in a similar fashion.
  • an example of how to run inference is in the Colab notebook linked above.

It achieves the following results on the evaluation set:

  • Loss: 2.0505
  • Rouge1: 35.3881
  • Rouge2: 11.509
  • Rougel: 23.1543
  • Rougelsum: 31.3295
  • Gen Len: 80.8

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 2526
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.5376 1.0 99 2.0104 35.8802 11.4595 23.6656 31.49 77.77
1.499 2.0 198 2.0358 35.1265 11.549 23.1062 30.8815 88.88
1.5034 3.0 297 2.0505 35.3881 11.509 23.1543 31.3295 80.8