pszemraj's picture
Update README.md
5ed946f verified
---
library_name: transformers
tags:
- synthetic
- '16384'
license: apache-2.0
datasets:
- BEE-spoke-data/synthsumm-open-v1.0
language:
- en
base_model:
- google/pegasus-x-base
pipeline_tag: summarization
---
# pegasus-x-base-synthsumm_open-16k
<a href="https://colab.research.google.com/gist/pszemraj/230db7f3fef91ebe5e2957465198ea26/pegasus-x-base-synthsumm_open-16k-example.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
This is a text-to-text summarization model fine-tuned from [pegasus-x-base](https://hf.co/google/pegasus-x-base) on a dataset of long documents from various sources/domains and their synthetic summaries.
It performs surprisingly well as a general summarization model for its size. More details, a larger model, and the dataset will be released (_as time permits_).
## Usage
It's recommended to use this model with [beam search decoding](https://huggingface.co/docs/transformers/generation_strategies#beamsearch-decoding). If interested, you can also use the [textsum](https://github.com/pszemraj/textsum) util package to have most of this abstracted out for you:
```bash
pip install -U textsum
```
then:
```python
from textsum.summarize import Summarizer
model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)
```