|
---
|
|
library_name: transformers
|
|
tags:
|
|
- synthetic
|
|
- '16384'
|
|
license: apache-2.0
|
|
datasets:
|
|
- BEE-spoke-data/synthsumm-open-v1.0
|
|
language:
|
|
- en
|
|
base_model:
|
|
- google/pegasus-x-base
|
|
pipeline_tag: summarization
|
|
---
|
|
|
|
# pegasus-x-base-synthsumm_open-16k |
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/230db7f3fef91ebe5e2957465198ea26/pegasus-x-base-synthsumm_open-16k-example.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This is a text-to-text summarization model fine-tuned from [pegasus-x-base](https://hf.co/google/pegasus-x-base) on a dataset of long documents from various sources/domains and their synthetic summaries. |
|
|
|
|
|
It performs surprisingly well as a general summarization model for its size. More details, a larger model, and the dataset will be released (_as time permits_). |
|
|
|
## Usage |
|
|
|
It's recommended to use this model with [beam search decoding](https://huggingface.co/docs/transformers/generation_strategies#beamsearch-decoding). If interested, you can also use the [textsum](https://github.com/pszemraj/textsum) util package to have most of this abstracted out for you: |
|
|
|
|
|
```bash |
|
pip install -U textsum |
|
``` |
|
|
|
then: |
|
|
|
```python |
|
from textsum.summarize import Summarizer |
|
|
|
model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k" |
|
summarizer = Summarizer(model_name) # GPU auto-detected |
|
text = "put the text you don't want to read here" |
|
summary = summarizer.summarize_string(text) |
|
print(summary) |
|
``` |