flan-t5-xl-summary-map-reduce-1024

A larger t2t model trained to complete the "reduce" step (consolidation step) of map-reduce summarization.

About

Refer to this wiki page or the smaller BART model card for explanations and usage examples.

Comparatively, this model seems to

  • produce more eloquent final reduced summaries
  • more "gullible"/sensitive to noise in the input summaries
    • i.e. a hallucinated one-off term/name/entity is likely to be mentioned/appear in the reduced summary
  • agnostic to whitespace in input (by definition, since the t5 tokenizer normalizes whitespace)

Therefore, it's recommended to compare sample outputs of this model and the BART version on your data to see which is better for your use case.

Details

This model is a fine-tuned version of google/flan-t5-xl on the pszemraj/summary-map-reduce-v1 dataset at 1024 context length in/out.

It achieves the following results on the evaluation set:

  • Loss: 0.6039
  • Num Input Tokens Seen: 7138765

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 17868
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0
Downloads last month
64
Safetensors
Model size
2.85B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for pszemraj/flan-t5-xl-summary-map-reduce-1024

Base model

google/flan-t5-xl
Quantized
(3)
this model

Dataset used to train pszemraj/flan-t5-xl-summary-map-reduce-1024