autoevaluator's picture
Add evaluation results on the samsum config and test split of samsum
4e1b2c1
|
raw
history blame
5.69 kB
metadata
language:
  - en
license:
  - apache-2.0
  - bsd-3-clause
tags:
  - summarization
  - extractive
  - summary
  - abstractive
  - multi-task
  - document summary
datasets:
  - jordiclive/scored_summarization_datasets
metrics:
  - rouge
model-index:
  - name: jordiclive/flan-t5-3b-summarizer
    results:
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: test
        metrics:
          - type: rouge
            value: 42.2164
            name: ROUGE-1
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjBkY2JlNzk4YWE5ZTNhOGZkYjU5NzI2OWQzYmRiMmU5NDk0NTY1NjQ5ZWZhMTk0NWNkYzNlNzZjMjBiYjdmZiIsInZlcnNpb24iOjF9.FA5rlzf0hlKuKGlsPFJTkbodLKb2K8VufZS0WLsLLhFoa0HmFpwIo8AaaxTBzXuoszMXDh8kyImgoCubRQ1fAw
          - type: rouge
            value: 19.2018
            name: ROUGE-2
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOGU2MGFlMzI2NzViNWRjZjU2YzQwM2YzNDI2N2M5OGQ0YmEzOWM5NGJkNjI5ODQ0MGIzZGJhZGU0Y2QyOGJmNyIsInZlcnNpb24iOjF9.ID9W1Kc8ZcUjkZUxbrxnqL_kY1jumySGlsauPma3JavTvh2z7_ay6kDKSz3pVP6pXHx02lAcXRLesBSZMnQ4DA
          - type: rouge
            value: 35.2859
            name: ROUGE-L
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjliMzExMTJhZGMwZWU0NTdlMjNhNjVhNjk0ZjNkMTE4MGQ5ZWJjYmFlNGJhY2Y2YmIxMzRkYmU1M2Y0MmJhYyIsInZlcnNpb24iOjF9.8QJSOo1Q5izRaWx1yfzupfjVxUcid-v5yH6w371BlWxTSjfhr8uJwXe7wSBZqRVfz5q5rJmnDGCJj5jITphlBQ
          - type: rouge
            value: 37.982
            name: ROUGE-LSUM
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDMyZGU3NzdhMjlkYjIzODgwZjU1N2M3NWUzYmEwYjc4ODI5ZjdmZTgwYTYyNjYzYmQxZTJiZDgyZjYwOTFhNCIsInZlcnNpb24iOjF9.fdqoEho2SXM_S_xpgpQPGbxnfzOGwRqqYVORLAqrAHN5lKFTmn6B1JjkIvLB66Tsp4Q4PZVZYdhWa2rD0GHsBw
          - type: loss
            value: 1.4534213542938232
            name: loss
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWU3ZjRjODU0NTQ4MzJkNjFlZDExNjNkZmExNGQ2MDZjZTFmM2ZhODk2OWJlYTk1ODlkYWNkNzk2ZDMwYzY4YiIsInZlcnNpb24iOjF9.qwUPJOndEx7kWgYPysXj1F5sYC_5DKlrVytPJDc1SWzCZl0mTau05IBCT7McHqr8lkOQuKS7035mjr3a_CfIAg
          - type: gen_len
            value: 16.6984
            name: gen_len
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzcyOWJlN2FmMjc0ZjU3NzdiMjQxNzIzZTNjZDIxOTQyNzg0NGM5MDI0NzQ3YjNiNDIzYzQ3NzM2Njc2ZjRiZCIsInZlcnNpb24iOjF9.8jECnJQer2QmY6AXCklaKrBB2NNfYAV_W1U8WK-pqz2bo2xgrIkvGlCS85MP3fc3yfrJZYcAgar_UnKnpCecBA

Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)

Open In Colab

A fine-tuned version of google/flan-t5-xl on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR)

Goal: a model that can be used for a general-purpose summarizer for academic and general usage. Control over the type of summary can be given by varying the instruction prepended to the source document. The result works well on lots of text, although trained with a max source length of 512 tokens and 150 max summary length.


Usage

Check the colab notebook for desired usage The model expects a prompt prepended to the source document to indicate the type of summary, examples of prompts used to train the model here: Prompts should be formatted with a colon at the end so that the input to the model is formatted as e.g. "Summarize the following: {input_text}". Note this model was trained with far fewer prompts than models like jordiclive/flan-t5-11b-summarizer-filtered so new prompts might not generalize as well.


. 
prompts = {
    "article": "Produce an article summary of the following news article:",
    "one_sentence": "Given the following news article, summarize the article in one sentence:",
    "conversation": "Briefly summarize in third person the following conversation:",
    "scitldr": "Given the following scientific article, provide a TL;DR summary:",
    "bill": "Summarize the following proposed legislation (bill):",
    "outlines": "Produce an article summary including outlines of each paragraph of the following article:",
}

After pip install transformers run the following code:

This pipeline will run slower and not have some of the tokenization parameters as the colab.

from transformers import pipeline

summarizer = pipeline("summarization", "jordiclive/flan-t5-3b-summarizer", torch_dtype=torch.bfloat16)

raw_document = 'You must be 18 years old to live or work in New York State...'
prompt = "Produce an article summary of the following news article:"
results = summarizer(
        f"{prompt} {raw_document}",
        num_beams=5,
        min_length=5,
        no_repeat_ngram_size=3,
        truncation=True,
        max_length=512,
    )

Training procedure

  • Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE-2 monitored on the validation set.

Hardware

  • GPU count 8 NVIDIA A100-SXM4-40GB
  • CPU count 48

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 5
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 2
  • effective_train_batch_size: 80
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • warmup_steps: 2000
  • num_epochs: 10

Framework versions

  • Transformers 4.24.0
  • Pytorch 1.9.1+cu111
  • Deepspeed 0.7.4
  • Pytorch-lightning 1.8.1