|
--- |
|
title: README |
|
emoji: π |
|
colorFrom: green |
|
colorTo: pink |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# MJ-Bench Team |
|
|
|
[MJ-Bench-Team](https://mj-bench.github.io/) is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance. |
|
|
|
<p align="center"> |
|
<img |
|
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/Stanford-logo.jpg" |
|
alt="Stanford University" |
|
width="180" |
|
style="display:inline-block; margin:0 -20px; vertical-align:left;"/> |
|
<img |
|
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UNC-logo.png" |
|
alt="UNC Chapel Hill" |
|
width="190" |
|
style="display:inline-block; margin:0 93px; vertical-align:middle;"/> |
|
<img |
|
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UChicago-logo.jpg" |
|
alt="University of Chicago" |
|
width="140" |
|
style="display:inline-block; margin:0 17px; vertical-align:middle;"/> |
|
</p> |
|
|
|
--- |
|
|
|
## Recent News |
|
- π₯ We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)! |
|
- π **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu). |
|
|
|
--- |
|
|
|
## π [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/) |
|
|
|
- **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/) |
|
- **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video) |
|
|
|
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model! |
|
|
|
<p align="center"> |
|
<img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/> |
|
</p> |
|
|
|
--- |
|
|
|
## π©ββοΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/) |
|
|
|
- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/) |
|
- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench) |
|
|
|
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, itβs crucial to align these models with desired behaviors based on feedback from a **multimodal judge**. |
|
|
|
<p align="center"> |
|
<img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/> |
|
</p> |
|
|
|
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions: |
|
|
|
1. **Alignment** |
|
2. **Safety** |
|
3. **Image Quality** |
|
4. **Bias** |
|
|
|
We evaluate a wide range of multimodal judges, including: |
|
- 6 smaller-sized CLIP-based scoring models |
|
- 11 open-source VLMs (e.g., the LLaVA family) |
|
- 4 closed-source VLMs (e.g., GPT-4, Claude 3) |
|
|
|
|
|
π₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!** |
|
You are welcome to submit your multimodal judgeβs evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard). |