metadata
library_name: transformers
tags:
- synthetic
- '16384'
license: apache-2.0
datasets:
- BEE-spoke-data/synthsumm-open-v1.0
language:
- en
base_model:
- google/pegasus-x-base
pipeline_tag: summarization
pegasus-x-base-synthsumm_open-16k
This is a text-to-text summarization model fine-tuned from pegasus-x-base on a dataset of long documents from various sources/domains and their synthetic summaries.
It performs surprisingly well as a general summarization model for its size. More details, a larger model, and the dataset will be released (as time permits).
Usage
It's recommended to use this model with beam search decoding. If interested, you can also use the textsum util package to have most of this abstracted out for you:
pip install -U textsum
then:
from textsum.summarize import Summarizer
model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)