SimCLS

SimCLS is a framework for abstractive summarization presented in SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization. It is a two-stage approach consisting of a generator and a scorer. In the first stage, a large pre-trained model for abstractive summarization (the generator) is used to generate candidate summaries, whereas, in the second stage, the scorer assigns a score to each candidate given the source document. The final summary is the highest-scoring candidate.

This model is the scorer trained for summarization of BillSum (paper, datasets). It should be used in conjunction with google/pegasus-billsum. See our Github repository for details on training, evaluation, and usage.

Usage

git clone https://github.com/andrejmiscic/simcls-pytorch.git
cd simcls-pytorch
pip3 install torch torchvision torchaudio transformers sentencepiece

from src.model import SimCLS, GeneratorType

summarizer = SimCLS(generator_type=GeneratorType.Pegasus,
                    generator_path="google/pegasus-billsum",
                    scorer_path="andrejmiscic/simcls-scorer-billsum")

document = "This is a legal document."
summary = summarizer(document)
print(summary)

Results

All of our results are reported together with 95% confidence intervals computed using 10000 iterations of bootstrap. See SimCLS paper for a description of baselines. We believe the discrepancies of Rouge-L scores between the original Pegasus work and our evaluation are due to the computation of the metric. Namely, we use a summary level Rouge-L score.

System	Rouge-1	Rouge-2	Rouge-L*
Pegasus	57.31	40.19	45.82
Our results	---	---	---
Origin	56.24, [55.74, 56.74]	37.46, [36.89, 38.03]	50.71, [50.19, 51.22]
Min	44.37, [43.85, 44.89]	25.75, [25.30, 26.22]	38.68, [38.18, 39.16]
Max	62.88, [62.42, 63.33]	43.96, [43.39, 44.54]	57.50, [57.01, 58.00]
Random	54.93, [54.43, 55.43]	35.42, [34.85, 35.97]	49.19, [48.68, 49.70]
SimCLS	57.49, [57.01, 58.00]	38.54, [37.98, 39.10]	51.91, [51.39, 52.43]

Citation of the original work

@inproceedings{liu-liu-2021-simcls,
    title = "{S}im{CLS}: A Simple Framework for Contrastive Learning of Abstractive Summarization",
    author = "Liu, Yixin  and
      Liu, Pengfei",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-short.135",
    doi = "10.18653/v1/2021.acl-short.135",
    pages = "1065--1072",
}

andrejmiscic
/

simcls-scorer-billsum

SimCLS

Usage

Results

Citation of the original work

Dataset used to train andrejmiscic/simcls-scorer-billsum