File size: 3,939 Bytes
17a3da3 abde234 d9c84c0 abde234 d9c84c0 abde234 d9c84c0 abde234 8f971a0 23e5d86 abde234 2e7e16f abde234 b1b188a 83020f3 51f3dd0 b1b188a 51f3dd0 b1b188a 51f3dd0 b1b188a 3ba8654 51f3dd0 ab92717 51f3dd0 ab92717 83020f3 51f3dd0 bbb3390 51f3dd0 ab92717 51f3dd0 bbb3390 ab92717 51f3dd0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
title: README
emoji: π
colorFrom: green
colorTo: pink
sdk: static
pinned: false
---
# MJ-Bench Team
[MJ-Bench-Team](https://mj-bench.github.io/) is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.
<p align="center">
<img
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/Stanford-logo.jpg"
alt="Stanford University"
width="180"
style="display:inline-block; margin:0 -20px; vertical-align:left;"/>
<img
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UNC-logo.png"
alt="UNC Chapel Hill"
width="190"
style="display:inline-block; margin:0 93px; vertical-align:middle;"/>
<img
src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UChicago-logo.jpg"
alt="University of Chicago"
width="140"
style="display:inline-block; margin:0 17px; vertical-align:middle;"/>
</p>
---
## Recent News
- π₯ We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
- π **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu).
---
## π [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
- **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
- **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)
We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!
<p align="center">
<img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
</p>
---
## π©ββοΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)
- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)
Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, itβs crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.
<p align="center">
<img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
</p>
However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:
1. **Alignment**
2. **Safety**
3. **Image Quality**
4. **Bias**
We evaluate a wide range of multimodal judges, including:
- 6 smaller-sized CLIP-based scoring models
- 11 open-source VLMs (e.g., the LLaVA family)
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)
π₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**
You are welcome to submit your multimodal judgeβs evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard). |