File size: 6,829 Bytes

---
language:
- en
tags: 
- summarization
datasets:
- xsum
metrics:
- rouge
widget:
- text: "National Commercial Bank (NCB), Saudi Arabia\u2019s largest lender by assets,\
    \ agreed to buy rival Samba Financial Group for $15 billion in the biggest banking\
    \ takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according\
    \ to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will\
    \ offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787\
    \ ratio the banks set when they signed an initial framework agreement in June.The\
    \ offer is a 3.5% premium to Samba\u2019s Oct. 8 closing price of 27.50 riyals\
    \ and about 24% higher than the level the shares traded at before the talks were\
    \ made public. Bloomberg News first reported the merger discussions.The new bank\
    \ will have total assets of more than $220 billion, creating the Gulf region\u2019\
    s third-largest lender. The entity\u2019s $46 billion market capitalization nearly\
    \ matches that of Qatar National Bank QPSC, which is still the Middle East\u2019\
    s biggest lender with about $268 billion of assets."
       
model-index:
- name: human-centered-summarization/financial-summarization-pegasus
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: xsum
      type: xsum
      config: default
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 35.2055
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 16.5689
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 30.1285
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 30.1706
      verified: true
    - name: loss
      type: loss
      value: 2.7092134952545166
      verified: true
    - name: gen_len
      type: gen_len
      value: 15.1414
      verified: true
---

### PEGASUS for Financial Summarization 

This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies. 

It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf). 

### How to use 
We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.

```Python
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

# Let's load the model and the tokenizer 
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model 
                                                                    # just replace with TFPegasusForConditionalGeneration


# Some text to summarize here
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."

# Tokenize our text
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

# Generate the output (Here, we use beam search but you can also use any other strategy you like)
output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

# Finally, we can print the generated summary
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
```

## Evaluation Results
The results before and after the fine-tuning on our dataset are shown below:


| Fine-tuning |  R-1  |  R-2  |  R-L   |  R-S  |
|:-----------:|:-----:|:-----:|:------:|:-----:|
| Yes         | 23.55 |  6.99 | 18.14  | 21.36 | 
| No          | 13.8  |  2.4  | 10.63  | 12.03 |


## Citation

You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:

> T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021. 
> Towards Human-Centered Summarization: A Case Study on Financial News.
> In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.

BibTeX entry:

```
@inproceedings{passali-etal-2021-towards,
    title = "Towards Human-Centered Summarization: A Case Study on Financial News",
    author = "Passali, Tatiana  and Gidiotis, Alexios  and Chatzikyriakidis, Efstathios  and Tsoumakas, Grigorios",
    booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
    pages = "21--27",
}
```

## Support

Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!

More information about Medoid AI: 
- Website: [https://www.medoid.ai](https://www.medoid.ai)
- LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)