metadata
license: mit
tags:
- generated_from_trainer
datasets:
- multi_news
metrics:
- rouge
base_model: facebook/bart-large-cnn
model-index:
- name: bart-large-cnn-finetuned-multi-news
results:
- task:
type: text2text-generation
name: Sequence-to-sequence Language Modeling
dataset:
name: multi_news
type: multi_news
args: default
metrics:
- type: rouge
value: 42.0423
name: Rouge1
bart-large-cnn-finetuned-multi-news
This model is a fine-tuned version of facebook/bart-large-cnn on the multi_news dataset. It achieves the following results on the evaluation set:
- Loss: 2.0950
- Rouge1: 42.0423
- Rouge2: 14.8812
- Rougel: 23.3412
- Rougelsum: 36.2613
Model description
bart-large-cnn fine tuned on sample of multi-news dataset
Intended uses & limitations
The intended use of the model is for downstream summarization tasks but it's limited to input text 1024 words. Any text longer than that would be truncated.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
2.2037 | 1.0 | 750 | 2.0950 | 42.0423 | 14.8812 | 23.3412 | 36.2613 |
Framework versions
- Transformers 4.18.0
- Pytorch 1.10.0+cu111
- Datasets 2.0.0
- Tokenizers 0.11.6