InterleavedBench (EMNLP'24 Main Conference)

This is the official huggingface repo for the paper "Holistic Evaluation for Interleaved Text-and-Image Generation" accepted in EMNLP 2024 Main Conference.

Paper: https://arxiv.org/abs/2406.14643

Website: https://vt-nlp.github.io/InterleavedEval/

How to use InterleavedBench

Repo hierarchy

  • interleaved_bench.json is the main json file of the dataset.
  • zipped_images is the directory of zipped images for each subset, including the images for the context and ground truths.
  • src/interleavedeval_gpt4o.py is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file.

To get started

  • unzip the images files under zipped_images.
  • Run the inference on interleaved_bench.json with your model and get your model output (including text and image).
  • Use the script in src/interleavedeval_gpt4o.py to perform evaluation.

Important notes

  • For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance.

One example in interleaved_bench.json is as follows:

{
    "id": "wikihow_next_step_0_489157",
    "image": [
        "wiki_images_test/489157_0_0.png",
        "wiki_images_test/489157_0_1.png",
        "wiki_images_test/489157_0_2.png",
        "wiki_images_test/489157_0_3.png",
        "wiki_images_test/489157_0_4.png"
    ],
    "task_name": "wikihow_next_step",
    "conversations": [
        {
            "from": "human",
            "value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n  You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n <BEGIN>  Put 2 to 3 bananas in a bowl. <image>\n"
        },
        {
            "from": "gpt",
            "value": "Now put in a liter of kefir and a teaspoon of sugar. <image>\n Put 1 cup milk into the mix. <image>\n Using a blender, blend all ingredients together. <image>\n Relax with your fresh banana smoothie! <image>\n"
        }
    ],
    "goal": "How to Make a Banana Shake",
    "category": [
        "Food and Entertaining",
        "Drinks",
        "Smoothies Shakes and Milk",
        "Fruit Based Shakes"
    ],
    "dataset_id": "wikihow_selected_test_uni"
},

Reference

If you find our work useful or interesting, please cite:

@article{liu_holistic_2024,
  author       = {Minqian Liu and
                  Zhiyang Xu and
                  Zihao Lin and
                  Trevor Ashby and
                  Joy Rimchala and
                  Jiaxin Zhang and
                  Lifu Huang},
  title        = {Holistic Evaluation for Interleaved Text-and-Image Generation},
  journal      = {CoRR},
  volume       = {abs/2406.14643},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2406.14643},
  doi          = {10.48550/ARXIV.2406.14643},
  eprinttype    = {arXiv},
  eprint       = {2406.14643},
  timestamp    = {Tue, 16 Jul 2024 16:17:50 +0200}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.