File size: 3,939 Bytes
17a3da3
 
 
 
 
 
 
 
 
abde234
 
 
 
 
 
d9c84c0
abde234
d9c84c0
 
abde234
 
 
 
d9c84c0
abde234
 
 
8f971a0
23e5d86
abde234
 
 
 
 
 
2e7e16f
abde234
 
b1b188a
 
 
83020f3
 
 
51f3dd0
b1b188a
51f3dd0
 
 
b1b188a
51f3dd0
b1b188a
 
3ba8654
51f3dd0
 
ab92717
51f3dd0
ab92717
83020f3
 
 
 
51f3dd0
bbb3390
51f3dd0
 
 
 
ab92717
51f3dd0
 
 
 
bbb3390
ab92717
51f3dd0
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
title: README
emoji: πŸŒ–
colorFrom: green
colorTo: pink
sdk: static
pinned: false
---

# MJ-Bench Team

[MJ-Bench-Team](https://mj-bench.github.io/) is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

<p align="center">
  <img 
       src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/Stanford-logo.jpg" 
       alt="Stanford University" 
       width="180" 
       style="display:inline-block; margin:0 -20px; vertical-align:left;"/>
  <img 
       src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UNC-logo.png" 
       alt="UNC Chapel Hill" 
       width="190" 
       style="display:inline-block; margin:0 93px; vertical-align:middle;"/>
  <img 
       src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UChicago-logo.jpg" 
       alt="University of Chicago" 
       width="140" 
       style="display:inline-block; margin:0 17px; vertical-align:middle;"/>
</p>

---

## Recent News
- πŸ”₯ We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
- πŸŽ‰ **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu).

---

## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)

- **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
- **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)

We release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!

<p align="center">
  <img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
</p>

---

## πŸ‘©β€βš–οΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)

- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)
- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.

<p align="center">
  <img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
</p>

However, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

1. **Alignment**  
2. **Safety**  
3. **Image Quality**  
4. **Bias**  

We evaluate a wide range of multimodal judges, including:
- 6 smaller-sized CLIP-based scoring models  
- 11 open-source VLMs (e.g., the LLaVA family)  
- 4 closed-source VLMs (e.g., GPT-4, Claude 3)


πŸ”₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!**  
You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).