metadata
license: mit
tags:
- generated_from_trainer
datasets:
- multi_news
metrics:
- rouge
model-index:
- name: bart-large-cnn-finetuned-multi-news
results:
- task:
name: Sequence-to-sequence Language Modeling
type: text2text-generation
dataset:
name: multi_news
type: multi_news
args: default
metrics:
- name: Rouge1
type: rouge
value: 42.0423
bart-large-cnn-finetuned-multi-news
This model is a fine-tuned version of facebook/bart-large-cnn on the multi_news dataset. It achieves the following results on the evaluation set:
- Loss: 2.0950
- Rouge1: 42.0423
- Rouge2: 14.8812
- Rougel: 23.3412
- Rougelsum: 36.2613
Model description
bart-large-cnn fine tuned on sample of multi-news dataset
Intended uses & limitations
The intended use of the model is for downstream summarization tasks but it's limited to input text 1024 words. Any text longer than that would be truncated.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
2.2037 | 1.0 | 750 | 2.0950 | 42.0423 | 14.8812 | 23.3412 | 36.2613 |
Framework versions
- Transformers 4.18.0
- Pytorch 1.10.0+cu111
- Datasets 2.0.0
- Tokenizers 0.11.6