metadata

title: README
emoji: 🌖
colorFrom: green
colorTo: pink
sdk: static
pinned: false

MJ-Bench Team

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Project page: https://aiming-lab.github.io/MJ-VIDEO.github.io/
Code repository: https://github.com/aiming-lab/MJ-Video

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

MJ-Video Overview

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Project page: https://mj-bench.github.io/
Code repository: https://github.com/MJ-Bench/MJ-Bench

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

MJ-Bench Dataset Overview

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

Alignment
Safety
Image Quality
Bias

We evaluate a wide range of multimodal judges, including:

6 smaller-sized CLIP-based scoring models
11 open-source VLMs (e.g., the LLaVA family)
4 closed-source VLMs (e.g., GPT-4, Claude 3)

🔥 We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.

Spaces:

MJ-Bench
/

README

Running

MJ-Bench Team

Recent News

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

👩‍⚖️ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?