File size: 4,006 Bytes
77e20b1 3111bd0 e5fd80a 67ddc44 0a7d026 77e20b1 9b07196 a917f9e 77e20b1 a917f9e 46085ee a917f9e 00743f3 a917f9e 77e20b1 a917f9e 1d0db05 a917f9e 77e20b1 9e45379 77e20b1 a917f9e 77e20b1 a917f9e 77e20b1 a917f9e 77e20b1 a917f9e 77e20b1 0a7d026 a917f9e 77e20b1 a917f9e 77e20b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
---
language:
- en
tags:
- summarization
datasets:
- scientific_papers
metrics:
- rouge
model-index:
- name: ccdv/lsg-bart-base-16384-arxiv
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
**Transformers >= 4.36.1**\
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
LSG ArXiv [paper](https://arxiv.org/abs/2210.15497). \
Github/conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-16384-arxiv", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-16384-arxiv", trust_remote_code=True)
text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
text,
truncation=True,
max_length=64,
no_repeat_ngram_size=7,
num_beams=2,
early_stopping=True
)
```
# ccdv/lsg-bart-base-16384-arxiv
This model is a fine-tuned version of [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv) on the [scientific_papers arxiv](https://huggingface.co/datasets/scientific_papers) dataset. \
The model is converted to handle 16384 long sequences and fine-tuned accordingly during 1 epoch. \
It achieves the following results on the test set:
| Length | Global tokens | Fine-tuning | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|:------ |:------------- |:----------- |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 16384 | 64 | Full | 256 | 0 | 768 | 48.74 | 20.88 | 28.50 | 44.23 |
| 16384 | 1 | Full | 256 | 0 | 768 | 48.66 | 20.92 | 28.50 | 44.18 |
| 16384 | 64 | Global only | 256 | 0 | 768 | 48.08 | 20.42 | 28.00 | 43.65 |
| 16384 | 1 | None | 256 | 0 | 768 | 47.03 | 20.19 | 28.26 | 42.69 |
Reference model:
| Length | Global tokens | Fine-tuning | Block Size | Sparsity | Connexions | R1 | R2 | RL | RLsum |
|:------ |:------------- |:----------- |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 4096 | 1 | - | 256 | 0 | 768 | 46.65 | 18.91 | 26.90 | 42.18 |
## Model description
The model relies on Local-Sparse-Global attention to handle long sequences:
![attn](attn.png)
The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
The model is warm started from [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv), converted to handle long sequences (encoder only) and fine tuned.
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1.0
### Generate hyperparameters
The following hyperparameters were used during generation:
- dataset_name: scientific_papers
- dataset_config_name: arxiv
- eval_batch_size: 4
- eval_samples: 6440
- early_stopping: True
- ignore_pad_token_for_loss: True
- length_penalty: 2.0
- max_length: 320
- min_length: 32
- num_beams: 5
- no_repeat_ngram_size: None
- seed: 123
### Framework versions
- Transformers 4.18.0
- Pytorch 1.10.1+cu102
- Datasets 2.1.0
- Tokenizers 0.11.6
|