File size: 6,829 Bytes
120d484
 
 
4c7b827
 
120d484
 
 
 
e3b7f9d
a720f82
 
 
 
 
 
 
 
 
 
 
 
 
4c7b827
a720f82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120d484
 
3af34e3
0f25402
3af34e3
0f25402
2ff42a7
0f25402
b5740b7
3af34e3
0f25402
 
 
 
 
 
 
 
 
 
 
 
e8e3a2f
0f25402
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8d46cf
0f25402
 
 
 
 
 
 
 
 
32d7ccb
 
 
 
 
 
33b68fc
32d7ccb
33b68fc
32d7ccb
 
 
 
33b68fc
 
 
 
 
 
 
 
 
 
bb24c62
32d7ccb
3af34e3
 
 
 
 
56a83e3
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
language:
- en
tags: 
- summarization
datasets:
- xsum
metrics:
- rouge
widget:
- text: "National Commercial Bank (NCB), Saudi Arabia\u2019s largest lender by assets,\
    \ agreed to buy rival Samba Financial Group for $15 billion in the biggest banking\
    \ takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according\
    \ to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will\
    \ offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787\
    \ ratio the banks set when they signed an initial framework agreement in June.The\
    \ offer is a 3.5% premium to Samba\u2019s Oct. 8 closing price of 27.50 riyals\
    \ and about 24% higher than the level the shares traded at before the talks were\
    \ made public. Bloomberg News first reported the merger discussions.The new bank\
    \ will have total assets of more than $220 billion, creating the Gulf region\u2019\
    s third-largest lender. The entity\u2019s $46 billion market capitalization nearly\
    \ matches that of Qatar National Bank QPSC, which is still the Middle East\u2019\
    s biggest lender with about $268 billion of assets."
       
model-index:
- name: human-centered-summarization/financial-summarization-pegasus
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: xsum
      type: xsum
      config: default
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 35.2055
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 16.5689
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 30.1285
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 30.1706
      verified: true
    - name: loss
      type: loss
      value: 2.7092134952545166
      verified: true
    - name: gen_len
      type: gen_len
      value: 15.1414
      verified: true
---

### PEGASUS for Financial Summarization 

This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies. 

It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf). 

### How to use 
We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.

```Python
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration

# Let's load the model and the tokenizer 
model_name = "human-centered-summarization/financial-summarization-pegasus"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model 
                                                                    # just replace with TFPegasusForConditionalGeneration


# Some text to summarize here
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."

# Tokenize our text
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids

# Generate the output (Here, we use beam search but you can also use any other strategy you like)
output = model.generate(
    input_ids, 
    max_length=32, 
    num_beams=5, 
    early_stopping=True
)

# Finally, we can print the generated summary
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
```

## Evaluation Results
The results before and after the fine-tuning on our dataset are shown below:


| Fine-tuning |  R-1  |  R-2  |  R-L   |  R-S  |
|:-----------:|:-----:|:-----:|:------:|:-----:|
| Yes         | 23.55 |  6.99 | 18.14  | 21.36 | 
| No          | 13.8  |  2.4  | 10.63  | 12.03 |


## Citation

You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:

> T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021. 
> Towards Human-Centered Summarization: A Case Study on Financial News.
> In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.

BibTeX entry:

```
@inproceedings{passali-etal-2021-towards,
    title = "Towards Human-Centered Summarization: A Case Study on Financial News",
    author = "Passali, Tatiana  and Gidiotis, Alexios  and Chatzikyriakidis, Efstathios  and Tsoumakas, Grigorios",
    booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
    pages = "21--27",
}
```

## Support

Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!

More information about Medoid AI: 
- Website: [https://www.medoid.ai](https://www.medoid.ai)
- LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)