metadata

license: apache-2.0
tags:
  - generated_from_trainer
  - summarization
  - stacked summaries
  - prompt engineering
metrics:
  - rouge
datasets:
  - stacked-summaries/stacked-samsum-1024
model-index:
  - name: flan-t5-large-stacked-samsum1024-WIP3
    results:
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: test
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 47.6682
            verified: true
          - name: ROUGE-2
            type: rouge
            value: 23.3053
            verified: true
          - name: ROUGE-L
            type: rouge
            value: 39.7678
            verified: true
          - name: ROUGE-LSUM
            type: rouge
            value: 43.259
            verified: true
          - name: loss
            type: loss
            value: 2.372586965560913
            verified: true
          - name: gen_len
            type: gen_len
            value: 17.4237
            verified: true
language:
  - en
library_name: transformers
pipeline_tag: summarization

flan-t5-large-stacked-samsum-1024

This model is a fine-tuned version of google/flan-t5-large on the stacked-summaries/stacked-samsum-1024 dataset.

It achieves the following results on the evaluation set:

Loss: 2.1846
Rouge1: 57.9637
Rouge2: 28.7446
Rougel: 44.3826
Rougelsum: 54.0399
Gen Len: 122.77

Model description

This model card presents a model trained on a stacked dataset that aims to improve summarization by testing the benefits of "task-oriented pretraining". The model is designed to learn how to effectively condense and distill information from text by stacking summaries and separating them into independent concepts. In this way, the model can learn to identify essential information without simply mimicking the style of the dataset summaries.

The token used to identify a new concept in the summary is [NEXT_CONCEPT]. You can split an output summary based on this token to see how it split the input text information: summary_text.split("[NEXT_CONCEPT]") etc.

Intended uses & limitations

max input/output is 1024 tokens
this is mostly a test because samsum is not exactly the best dataset for general-purpose summarization

Training and evaluation data

See the dataset card linked on this page for info

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 24915
distributed_type: multi-GPU
gradient_accumulation_steps: 32
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.02
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.1195	0.17	20	2.0635	57.8829	28.7887	44.4256	54.1299	121.8
0.1084	0.35	40	2.1178	58.0416	28.6487	44.3905	54.1557	122.893
0.1019	0.52	60	2.1576	57.816	28.7069	44.4242	53.9598	120.524
0.0975	0.7	80	2.1821	57.9597	28.8178	44.4854	54.068	121.793
0.0947	0.87	100	2.1846	57.9637	28.7446	44.3826	54.0399	122.77

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.6.1
Tokenizers 0.13.1