--- datasets: - multi_news metrics: - bleu - rouge pipeline_tag: summarization --- # Hyperparameters learning_rate=2e-5 per_device_train_batch_size=14 per_device_eval_batch_size=14 weight_decay=0.01 save_total_limit=3 num_train_epochs=3 predict_with_generate=True fp16=True # Training Output global_step=7710 training_loss=2.1297076629757417 metrics={'train_runtime': 6059.0418, 'train_samples_per_second': 17.813, 'train_steps_per_second': 1.272, 'total_flos': 2.3389776681055027e+17, 'train_loss': 2.1297076629757417, 'epoch': 3.0} # Training Results | Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Bleu | Gen Len | |:----- |:------------ |:--------------- |:-------- | :------- |:-------- |:--------- |:-------- |:--------- | | 1 | 2.223100 | 2.038599 | 0.147400 | 0.054800 | 0.113500 | 0.113500 | 0.001400 | 20.000000 | | 2 | 2.078100 | 2.009619 | 0.152900 | 0.057800 | 0.117000 | 0.117000 | 0.001600 | 20.000000 | | 3 | 1.989000 | 2.006006 | 0.152900 | 0.057300 | 0.116700 | 0.116700 | 0.001700 | 20.000000 |