metadata

title: MJ Bench Leaderboard
emoji: 🥇
colorFrom: green
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: true
license: apache-2.0

Start the configuration

Most of the variables to change for a default leaderboard are in src/env.py (replace the path for your leaderboard) and src/about.py (for tasks).

Results files should have the following format and be stored as json files:

{
    "config": {
        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
        "model_name": "path of the model on the hub: org/model",
        "model_sha": "revision on the hub",
    },
    "results": {
        "task_name": {
            "metric_name": score,
        },
        "task_name2": {
            "metric_name": score,
        }
    }
}

Request files are created automatically by this tool.

If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.

Code logic for more complex edits

You'll find

the main table' columns names and properties in src/display/utils.py
the logic to read all results and request files, then convert them in dataframe lines, in src/leaderboard/read_evals.py, and src/populate.py
teh logic to allow or filter submissions in src/submission/submit.py and src/submission/check_validity.py

Citation

@misc{chen2024mjbenchmultimodalrewardmodel,
      title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?}, 
      author={Zhaorun Chen and Yichao Du and Zichen Wen and Yiyang Zhou and Chenhang Cui and Zhenzhen Weng and Haoqin Tu and Chaoqi Wang and Zhengwei Tong and Qinglan Huang and Canyu Chen and Qinghao Ye and Zhihong Zhu and Yuqing Zhang and Jiawei Zhou and Zhuokai Zhao and Rafael Rafailov and Chelsea Finn and Huaxiu Yao},
      year={2024},
      eprint={2407.04842},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.04842}, 
}