File size: 1,497 Bytes
5ed946f
 
 
 
 
 
 
 
 
 
 
 
 
 
cab9ad3
e1470d9
cab9ad3
5ed946f
 
 
cab9ad3
cdc4236
cab9ad3
cdc4236
 
cab9ad3
e1470d9
cab9ad3
b01c83e
cab9ad3
 
e1470d9
 
 
cab9ad3
e1470d9
cab9ad3
e1470d9
 
cab9ad3
e1470d9
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---

library_name: transformers
tags:
- synthetic
- '16384'
license: apache-2.0
datasets:
- BEE-spoke-data/synthsumm-open-v1.0
language:
- en
base_model:
- google/pegasus-x-base
pipeline_tag: summarization
---


# pegasus-x-base-synthsumm_open-16k

<a href="https://colab.research.google.com/gist/pszemraj/230db7f3fef91ebe5e2957465198ea26/pegasus-x-base-synthsumm_open-16k-example.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This is a text-to-text summarization model fine-tuned from [pegasus-x-base](https://hf.co/google/pegasus-x-base) on a dataset of long documents from various sources/domains and their synthetic summaries. 


It performs surprisingly well as a general summarization model for its size. More details, a larger model, and the dataset will be released (_as time permits_).

## Usage 

It's recommended to use this model with [beam search decoding](https://huggingface.co/docs/transformers/generation_strategies#beamsearch-decoding). If interested, you can also use the [textsum](https://github.com/pszemraj/textsum) util package to have most of this abstracted out for you:


```bash
pip install -U textsum
```

then:

```python
from textsum.summarize import Summarizer

model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)
```