---
license: apache-2.0
base_model: google/flan-t5-base
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: t5-summarization-base-zero-shot-headers-and-better-prompt
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# t5-summarization-base-zero-shot-headers-and-better-prompt

This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.9932
- Rouge: {'rouge1': 0.4431, 'rouge2': 0.2212, 'rougeL': 0.2154, 'rougeLsum': 0.2154}
- Bert Score: 0.884
- Bleurt 20: -0.6941
- Gen Len: 14.385

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Rouge                                                                       | Bert Score | Bleurt 20 | Gen Len |
|:-------------:|:-----:|:-----:|:---------------:|:---------------------------------------------------------------------------:|:----------:|:---------:|:-------:|
| 2.368         | 1.0   | 601   | 2.0543          | {'rouge1': 0.4357, 'rouge2': 0.19, 'rougeL': 0.2033, 'rougeLsum': 0.2033}   | 0.8765     | -0.7806   | 14.975  |
| 2.009         | 2.0   | 1202  | 1.9097          | {'rouge1': 0.4239, 'rouge2': 0.1934, 'rougeL': 0.2218, 'rougeLsum': 0.2218} | 0.8815     | -0.7203   | 14.46   |
| 1.7996        | 3.0   | 1803  | 1.8539          | {'rouge1': 0.4129, 'rouge2': 0.2041, 'rougeL': 0.2144, 'rougeLsum': 0.2144} | 0.8809     | -0.7453   | 13.68   |
| 1.5575        | 4.0   | 2404  | 1.8461          | {'rouge1': 0.4259, 'rouge2': 0.2083, 'rougeL': 0.2101, 'rougeLsum': 0.2101} | 0.8839     | -0.7329   | 14.015  |
| 1.3425        | 5.0   | 3005  | 1.8792          | {'rouge1': 0.4138, 'rouge2': 0.2006, 'rougeL': 0.2223, 'rougeLsum': 0.2223} | 0.8871     | -0.7175   | 13.655  |
| 1.1198        | 6.0   | 3606  | 1.9615          | {'rouge1': 0.4304, 'rouge2': 0.2097, 'rougeL': 0.2128, 'rougeLsum': 0.2128} | 0.8826     | -0.7148   | 14.105  |
| 1.0207        | 7.0   | 4207  | 2.0121          | {'rouge1': 0.4379, 'rouge2': 0.2189, 'rougeL': 0.2228, 'rougeLsum': 0.2228} | 0.8854     | -0.6751   | 14.11   |
| 0.9197        | 8.0   | 4808  | 2.1143          | {'rouge1': 0.4394, 'rouge2': 0.2145, 'rougeL': 0.2139, 'rougeLsum': 0.2139} | 0.8852     | -0.6924   | 14.25   |
| 0.807         | 9.0   | 5409  | 2.1749          | {'rouge1': 0.4572, 'rouge2': 0.2265, 'rougeL': 0.2199, 'rougeLsum': 0.2199} | 0.8839     | -0.6967   | 14.465  |
| 0.7784        | 10.0  | 6010  | 2.2013          | {'rouge1': 0.4451, 'rouge2': 0.2195, 'rougeL': 0.2152, 'rougeLsum': 0.2152} | 0.8849     | -0.6766   | 14.485  |
| 0.6285        | 11.0  | 6611  | 2.3428          | {'rouge1': 0.4367, 'rouge2': 0.2126, 'rougeL': 0.2157, 'rougeLsum': 0.2157} | 0.8846     | -0.7113   | 14.265  |
| 0.595         | 12.0  | 7212  | 2.5554          | {'rouge1': 0.4373, 'rouge2': 0.2161, 'rougeL': 0.2202, 'rougeLsum': 0.2202} | 0.8844     | -0.6867   | 14.36   |
| 0.611         | 13.0  | 7813  | 2.4775          | {'rouge1': 0.4416, 'rouge2': 0.2218, 'rougeL': 0.2151, 'rougeLsum': 0.2151} | 0.8833     | -0.7119   | 14.51   |
| 0.4811        | 14.0  | 8414  | 2.6892          | {'rouge1': 0.4412, 'rouge2': 0.2242, 'rougeL': 0.2223, 'rougeLsum': 0.2223} | 0.8848     | -0.6574   | 14.65   |
| 0.4211        | 15.0  | 9015  | 2.7409          | {'rouge1': 0.4471, 'rouge2': 0.2165, 'rougeL': 0.2141, 'rougeLsum': 0.2141} | 0.8843     | -0.6566   | 14.655  |
| 0.4611        | 16.0  | 9616  | 2.8461          | {'rouge1': 0.4363, 'rouge2': 0.2117, 'rougeL': 0.2101, 'rougeLsum': 0.2101} | 0.8835     | -0.6921   | 14.37   |
| 0.402         | 17.0  | 10217 | 2.8848          | {'rouge1': 0.4505, 'rouge2': 0.2204, 'rougeL': 0.2119, 'rougeLsum': 0.2119} | 0.8825     | -0.6888   | 14.615  |
| 0.4101        | 18.0  | 10818 | 2.9057          | {'rouge1': 0.4453, 'rouge2': 0.2216, 'rougeL': 0.2103, 'rougeLsum': 0.2103} | 0.8824     | -0.6881   | 14.505  |
| 0.3483        | 19.0  | 11419 | 2.9700          | {'rouge1': 0.4456, 'rouge2': 0.221, 'rougeL': 0.2139, 'rougeLsum': 0.2139}  | 0.8836     | -0.6888   | 14.415  |
| 0.36          | 20.0  | 12020 | 2.9932          | {'rouge1': 0.4431, 'rouge2': 0.2212, 'rougeL': 0.2154, 'rougeLsum': 0.2154} | 0.884      | -0.6941   | 14.385  |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.1
- Tokenizers 0.15.0