stacked-summaries
/

flan-t5-small-stacked-samsum-1024

text2text-generation

stacked summaries

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

flan-t5-small-stacked-samsum-1024 / README.md

pszemraj's picture

Update README.md

64f148c over 1 year ago

|

1.96 kB

	---
	license: apache-2.0
	datasets:
	- stacked-summaries/stacked-samsum-1024
	language:
	- en
	metrics:
	- rouge
	tags:
	- stacked summaries
	- samsum
	pipeline_tag: summarization
	---


	# flan-t5-small-stacked-samsum-1024

	This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on the `stacked-summaries/stacked-samsum-1024` dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.7573
	- Rouge1: 46.6072
	- Rouge2: 19.9754
	- Rougel: 35.2715
	- Rougelsum: 43.3599
	- Gen Len: 72.64

	## Model Description

	Trained on a summarization task with _potentially_ multiple doc-summary pairs stacked on top of each other.

	You can separate its predictions by using it's special token `[NEXT_CONCEPT]` to split the output into "separate topics".

	## Intended use & limitations

	- This is intended to be used as a baseline/reference for comparison with the larger models.

	## Training and evaluation data

	See `stacked-summaries/stacked-samsum-1024`.

	## Training Procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 22138
	- distributed_type: multi-GPU
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.05
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| 1.9011 \| 1.0 \| 230 \| 1.7986 \| 45.4597 \| 19.6956 \| 34.6878 \| 42.3724 \| 74.16 \|
	\| 1.8297 \| 2.0 \| 460 \| 1.7609 \| 46.0427 \| 20.2299 \| 35.2076 \| 43.0549 \| 70.56 \|
	\| 1.7637 \| 3.0 \| 690 \| 1.7573 \| 46.6072 \| 19.9754 \| 35.2715 \| 43.3599 \| 72.64 \|