README / README.md
Zhaorun's picture
Update README.md
23e5d86 verified
metadata
title: README
emoji: πŸŒ–
colorFrom: green
colorTo: pink
sdk: static
pinned: false

MJ-Bench Team

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

Stanford University UNC Chapel Hill University of Chicago


Recent News


😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

MJ-Video Overview


πŸ‘©β€βš–οΈ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

MJ-Bench Dataset Overview

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

  1. Alignment
  2. Safety
  3. Image Quality
  4. Bias

We evaluate a wide range of multimodal judges, including:

  • 6 smaller-sized CLIP-based scoring models
  • 11 open-source VLMs (e.g., the LLaVA family)
  • 4 closed-source VLMs (e.g., GPT-4, Claude 3)

πŸ”₯ We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.