ydshieh HF staff commited on
Commit
57659f7
1 Parent(s): fc24b76

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ datasets:
5
+ - scientific_papers
6
+ tags:
7
+ - summarization
8
+ model-index:
9
+ - name: google/bigbird-pegasus-large-pubmed
10
+ results:
11
+ - task:
12
+ type: summarization
13
+ name: Summarization
14
+ dataset:
15
+ name: scientific_papers
16
+ type: scientific_papers
17
+ config: pubmed
18
+ split: test
19
+ metrics:
20
+ - name: ROUGE-1
21
+ type: rouge
22
+ value: 40.8966
23
+ verified: true
24
+ - name: ROUGE-2
25
+ type: rouge
26
+ value: 18.1161
27
+ verified: true
28
+ - name: ROUGE-L
29
+ type: rouge
30
+ value: 26.1743
31
+ verified: true
32
+ - name: ROUGE-LSUM
33
+ type: rouge
34
+ value: 34.2773
35
+ verified: true
36
+ - name: loss
37
+ type: loss
38
+ value: 2.1707184314727783
39
+ verified: true
40
+ - name: meteor
41
+ type: meteor
42
+ value: 0.3513
43
+ verified: true
44
+ - name: gen_len
45
+ type: gen_len
46
+ value: 221.2531
47
+ verified: true
48
+ - task:
49
+ type: summarization
50
+ name: Summarization
51
+ dataset:
52
+ name: scientific_papers
53
+ type: scientific_papers
54
+ config: arxiv
55
+ split: test
56
+ metrics:
57
+ - name: ROUGE-1
58
+ type: rouge
59
+ value: 40.3815
60
+ verified: true
61
+ - name: ROUGE-2
62
+ type: rouge
63
+ value: 14.374
64
+ verified: true
65
+ - name: ROUGE-L
66
+ type: rouge
67
+ value: 23.4773
68
+ verified: true
69
+ - name: ROUGE-LSUM
70
+ type: rouge
71
+ value: 33.772
72
+ verified: true
73
+ - name: loss
74
+ type: loss
75
+ value: 3.235051393508911
76
+ verified: true
77
+ - name: gen_len
78
+ type: gen_len
79
+ value: 186.2003
80
+ verified: true
81
+ ---
82
+
83
+ # BigBirdPegasus model (large)
84
+
85
+ BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.
86
+
87
+ BigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).
88
+
89
+ Disclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.
90
+
91
+ ## Model description
92
+
93
+ BigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.
94
+
95
+ ## How to use
96
+
97
+ Here is how to use this model to get the features of a given text in PyTorch:
98
+
99
+ ```python
100
+ from transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer
101
+
102
+ tokenizer = AutoTokenizer.from_pretrained("google/bigbird-pegasus-large-pubmed")
103
+
104
+ # by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64
105
+ model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed")
106
+
107
+ # decoder attention type can't be changed & will be "original_full"
108
+ # you can change `attention_type` (encoder only) to full attention like this:
109
+ model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", attention_type="original_full")
110
+
111
+ # you can change `block_size` & `num_random_blocks` like this:
112
+ model = BigBirdPegasusForConditionalGeneration.from_pretrained("google/bigbird-pegasus-large-pubmed", block_size=16, num_random_blocks=2)
113
+
114
+ text = "Replace me by any text you'd like."
115
+ inputs = tokenizer(text, return_tensors='pt')
116
+ prediction = model.generate(**inputs)
117
+ prediction = tokenizer.batch_decode(prediction)
118
+ ```
119
+
120
+ ## Training Procedure
121
+
122
+ This checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **pubmed dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers).
123
+
124
+ ## BibTeX entry and citation info
125
+
126
+ ```tex
127
+ @misc{zaheer2021big,
128
+ title={Big Bird: Transformers for Longer Sequences},
129
+ author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},
130
+ year={2021},
131
+ eprint={2007.14062},
132
+ archivePrefix={arXiv},
133
+ primaryClass={cs.LG}
134
+ }
135
+ ```