Edit model card

long-t5-tglobal-base-16384-booksci-summary: v1

Open In Colab

An experiment investigating transfer learning capabilities by fine-tuning models on different datasets starting from the booksum checkpoint.

Model Details

This model is a fine-tuned version of pszemraj/long-t5-tglobal-base-16384-book-summary on the pszemraj/scientific_lay_summarisation-elife-norm dataset for two epochs.

Usage

It's recommended to use this model with beam search decoding. If interested, you can also use the textsum util repo to have most of this abstracted out for you:

pip install -U textsum
from textsum.summarize import Summarizer

model_name = "pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

  • This is an initial experiment
  • Domain generalization abilities at time of writing are unknown

Training procedure

Note: this model was trained at a lower LR & not till "absolute convergence" with the intention of retaining some of the properties learned from the initial fine-tuning on booksum

Results

It achieves the following results on the evaluation set:

  • Loss: 2.3994
  • Rouge1: 34.2428
  • Rouge2: 4.3644
  • Rougel: 12.5332
  • Rougelsum: 30.6965
  • Gen Len: 294.0249

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
2.7492 0.99 67 2.4272 34.6436 4.4536 12.4985 30.916 300.7635
2.6689 1.97 134 2.3994 34.2428 4.3644 12.5332 30.6965 294.0249
Downloads last month
16
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1

Quantized
(2)
this model

Dataset used to train pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1

Spaces using pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1 3

Evaluation results