arxiv:2406.15252

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Published on Jun 21

· Submitted by

wenhu on Jun 24

Upvote

Authors:

Dongfu Jiang ,

Ge Zhang ,

Max Ku ,

Achint Soni ,

Sherman Siu ,

Abhranil Chandra ,

Ziyan Jiang ,

Aaran Arulraj ,

Quy Duc Do ,

Yuansheng Ni ,

Yuchen Lin ,

Wenhu Chen

Abstract

The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos from 11 existing video generative models. We train MantisScore (initialized from Mantis) based on VideoFeedback to enable automatic video quality assessment. Experiments show that the Spearman correlation between MantisScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points. Further result on other held-out EvalCrafter, GenAI-Bench, and VBench show that MantisScore has consistently much higher correlation with human judges than other metrics. Due to these results, we believe MantisScore can serve as a great proxy for human raters to (1) rate different video models to track progress (2) simulate fine-grained human feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.

View arXiv page View PDF Add to collection

Community

wenhu

Paper author Paper submitter 5 days ago

•

edited 4 days ago

The first ever fine-grained video reward model, VideoScore. This metric is able to simulate human judgement for video generation. Its correlation with human raters is really high, outperforming other baselines by a huge margin.
https://tiger-ai-lab.github.io/VideoScore/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Abstract

Community

Models citing this paper 2

Datasets citing this paper 2

Spaces citing this paper 1

Collections including this paper 1