evaluation scores are different from Google paper

by zhongwei - opened Nov 23, 2022

Nov 23, 2022

I just evaluated the model using run_summarization.py with hugging face dataset: ccdv/arxiv-summarization, the Rouge1 score = 41.68
The Rouge1 score at Google paper ( https://arxiv.org/pdf/2208.04347.pdf ) for model PEGASUS-XBase with arXiv evaluation is 49.4
what are the reasons for the big difference? how would we get same score at hugging face as google paper.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment