File size: 1,497 Bytes

5ed946f
 
 
 
 
 
 
 
 
 
 
 
 
 
cab9ad3
e1470d9
cab9ad3
5ed946f
 
 
cab9ad3
cdc4236
cab9ad3
cdc4236
 
cab9ad3
e1470d9
cab9ad3
b01c83e
cab9ad3
 
e1470d9
 
 
cab9ad3
e1470d9
cab9ad3
e1470d9
 
cab9ad3
e1470d9

---

library_name: transformers
tags:
- synthetic
- '16384'
license: apache-2.0
datasets:
- BEE-spoke-data/synthsumm-open-v1.0
language:
- en
base_model:
- google/pegasus-x-base
pipeline_tag: summarization
---


# pegasus-x-base-synthsumm_open-16k

<a href="https://colab.research.google.com/gist/pszemraj/230db7f3fef91ebe5e2957465198ea26/pegasus-x-base-synthsumm_open-16k-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This is a text-to-text summarization model fine-tuned from [pegasus-x-base](https://hf.co/google/pegasus-x-base) on a dataset of long documents from various sources/domains and their synthetic summaries. 


It performs surprisingly well as a general summarization model for its size. More details, a larger model, and the dataset will be released (_as time permits_).

## Usage 

It's recommended to use this model with [beam search decoding](https://huggingface.co/docs/transformers/generation_strategies#beamsearch-decoding). If interested, you can also use the [textsum](https://github.com/pszemraj/textsum) util package to have most of this abstracted out for you:


```bash
pip install -U textsum
```

then:

```python
from textsum.summarize import Summarizer

model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)
```