Spaces:

MJ-Bench
/

README

Running

App Files Files Community

README / README.md

Zhaorun

Update README.md

23e5d86 verified 4 days ago

preview code

raw

history blame contribute delete

3.94 kB

	---
	title: README
	emoji: 🌖
	colorFrom: green
	colorTo: pink
	sdk: static
	pinned: false
	---

	# MJ-Bench Team

	[MJ-Bench-Team](https://mj-bench.github.io/) is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

	<p align="center">
	<img
	src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/Stanford-logo.jpg"
	alt="Stanford University"
	width="180"
	style="display:inline-block; margin:0 -20px; vertical-align:left;"/>
	<img
	src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UNC-logo.png"
	alt="UNC Chapel Hill"
	width="190"
	style="display:inline-block; margin:0 93px; vertical-align:middle;"/>
	<img
	src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench.github.io/main/static/images/UChicago-logo.jpg"
	alt="University of Chicago"
	width="140"
	style="display:inline-block; margin:0 17px; vertical-align:middle;"/>
	</p>

	---

	## Recent News
	- 🔥 We have released [MJ-Video](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!
	- 🎉 MJ-PreferGen is accepted by ICLR25! Check out the paper: [MJ-PreferGen: An Automatic Framework for Preference Data Synthesis](https://openreview.net/forum?id=WpZyPk79Fu).

	---

	## 😎 [MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)

	- Project page: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)
	- Code repository: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)

	We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

	<p align="center">
	<img src="https://raw.githubusercontent.com/aiming-lab/MJ-Video/main/asserts/overview.png" alt="MJ-Video Overview" width="80%"/>
	</p>

	---

	## 👩‍⚖️ [MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)

	- Project page: [https://mj-bench.github.io/](https://mj-bench.github.io/)
	- Code repository: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)

	Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

	<p align="center">
	<img src="https://raw.githubusercontent.com/MJ-Bench/MJ-Bench/main/assets/overview_new.png" alt="MJ-Bench Dataset Overview" width="80%"/>
	</p>

	However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

	1. Alignment
	2. Safety
	3. Image Quality
	4. Bias

	We evaluate a wide range of multimodal judges, including:
	- 6 smaller-sized CLIP-based scoring models
	- 11 open-source VLMs (e.g., the LLaVA family)
	- 4 closed-source VLMs (e.g., GPT-4, Claude 3)


	🔥 We are actively updating the [leaderboard](https://mj-bench.github.io/)!
	You are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).