diff --git a/dataset/3d_effect_generation_single_reference_0001/auto_eval.jsonl b/dataset/3d_effect_generation_single_reference_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..d5749152a4f2363571147ea820ace79957836c71 --- /dev/null +++ b/dataset/3d_effect_generation_single_reference_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nDoes the generated 3D rendering accurately retain every details of the shapes and outlines of the line drawing? 0 points: The shapes and outlines in the 3D rendering show noticeable deviations or distortions compared to the line drawing. 1 point: The 3D rendering accurately preserves the shapes and outlines from the line drawing in every detail.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nDoes the generated 3D rendering maintain the overall structure and proportions of the line drawing, ensuring consistency between the line drawing and the generated image? 0 points: The object's structure in the 3D rendering has been noticeably altered, with unbalanced proportions. 1 point: The structure and proportions of the object in the 3D rendering are consistent with the line drawing and are well-balanced.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nDoes the generated image effectively convey a sense of depth and three-dimensionality, presenting as a realistic 3D rendered image? 0 points: The image lacks a convincing 3D appearance, with minimal or no sense of depth. Key elements like lighting, shadows, and material effects are either absent or insufficiently applied, resulting in a flat or two-dimensional look. 1 point: The image successfully conveys a realistic 3D appearance, with well-applied lighting, shadows, and material effects that contribute to a sense of depth and dimensionality. The rendering effectively represents a believable 3D object with spatial volume.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nIf there is multiple angles, does the generated 3D rendering maintain consistency of the object across multiple angles in every detail? 0 points: The object appears inconsistent across different angles, with noticeable discrepancies in shape, proportions, or details between views. 1 point: There is no multiple angles, or the object remains consistent across all angles, with uniform shape, proportions, and details, creating a coherent and accurate representation from every perspective.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nDoes the rendering display realistic and well-defined shadows that enhance the 3D appearance of the object? 0 points: Shadows are poorly rendered, with inconsistent or unrealistic positioning, depth, or softness. 1 point: Shadows are accurately rendered, with appropriate depth, softness, and alignment to the light source.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two images, with the top image as the reference picture for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided design sketch and text requirements.\nThe text requirement is:\n\"Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism.\"\nYour review question is:\nDoes the generated 3D rendering have an overall aesthetic appeal, with high-quality visual detailing and a cohesive style that meets professional standards? 0 points: The 3D rendering lacks aesthetic appeal, with poor visual detailing and an incohesive style. 1 point: The 3D rendering demonstrates strong aesthetic appeal, with high-quality visual detailing and a cohesive style.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} diff --git a/dataset/3d_effect_generation_single_reference_0001/eval.json b/dataset/3d_effect_generation_single_reference_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..1b0aeafe55764fe3e33948432da3429cd9c16a21 --- /dev/null +++ b/dataset/3d_effect_generation_single_reference_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated 3D rendering accurately preserve every detail of the line drawing, including shapes and contours?", + "0_point_standard": "There are noticeable deviations or distortions in the shapes and contours in the 3D rendering compared to the line drawing.", + "1_point_standard": "The 3D rendering accurately preserves every detail of the shapes and contours in the line drawing." + }, + { + "question": "Does the generated 3D rendering maintain the overall structure and proportions of the line drawing, ensuring consistency between the line drawing and the generated image?", + "0_point_standard": "There are significant changes in the structure of objects in the 3D rendering, with imbalanced proportions.", + "1_point_standard": "The structure and proportions of objects in the 3D rendering are consistent with the line drawing, and the proportions are balanced." + }, + { + "question": "Does the generated image effectively convey depth and a sense of three-dimensionality, presenting a realistic 3D rendering effect?", + "0_point_standard": "The image lacks a convincing 3D appearance, with almost no sense of depth. Key elements such as lighting and material effects are missing or insufficiently applied, making the image appear flat.", + "1_point_standard": "The image successfully conveys a realistic 3D appearance, with well-applied lighting and material effects that enhance depth and three-dimensionality, presenting credible spatial volume." + }, + { + "question": "If the generated 3D rendering includes multiple angles, does it maintain consistency of the object across angles, ensuring no deviation in details?", + "0_point_standard": "The object appears inconsistent across different angles, with noticeable differences in shape, proportion, or detail between perspectives.", + "1_point_standard": "There are no multiple angles, or the object maintains consistency across all angles, with uniform shape, proportion, and details, ensuring accurate consistency in each perspective." + }, + { + "question": "Does the rendering display realistic and clear shadow effects, enhancing the 3D appearance of the object?", + "0_point_standard": "The shadow effects are poorly rendered, with inaccurate or unrealistic position, depth, or softness.", + "1_point_standard": "The shadow effects are accurately rendered, with appropriate depth, softness, and alignment with the light source." + }, + { + "question": "Does the generated 3D rendering have an overall aesthetic appeal, with high-quality visual details and a unified style that meets professional standards?", + "0_point_standard": "The 3D rendering lacks aesthetic appeal, with poor visual details and an inconsistent style.", + "1_point_standard": "The 3D rendering exhibits strong aesthetic appeal, with high-quality visual details and a unified style." + } + ] +} \ No newline at end of file diff --git a/dataset/3d_effect_generation_single_reference_0001/images.txt b/dataset/3d_effect_generation_single_reference_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..9b0f9bf3ef93bd6f4a5220bc3029ed1352c0362b --- /dev/null +++ b/dataset/3d_effect_generation_single_reference_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01sdobeF1zREcpWHb46_!!6000000006710-0-tps-3508-2192.jpg diff --git a/dataset/3d_effect_generation_single_reference_0001/instruction.txt b/dataset/3d_effect_generation_single_reference_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3226aee4f2614d24398f458b426a57b6f4377db7 --- /dev/null +++ b/dataset/3d_effect_generation_single_reference_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a corresponding 3D rendering based on the provided design sketch. The task objective is to infer a plausible 3D form from the product design sketch, displaying the appearance, structure, and functional details of the product. The model should integrate the different perspectives shown in the sketch to create a complete and consistent 3D model, rendering realistic lighting, shadows, and material effects. Ensure that the generated 3D image adheres to the proportions and design intentions depicted in the sketch, accurately showcasing the product from all angles and details, resulting in a high-quality 3D rendering with depth and realism. \ No newline at end of file diff --git a/dataset/3d_effect_generation_single_reference_0001/meta.json b/dataset/3d_effect_generation_single_reference_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..02219529b7df0b65c06009d56a5d03114a9d61ce --- /dev/null +++ b/dataset/3d_effect_generation_single_reference_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "3D effect generation with single reference", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0072", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/3d_effect_generation_three-view_reference_0002/auto_eval.jsonl b/dataset/3d_effect_generation_three-view_reference_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..31e38af6f0c19503eec7b04ad1196da65869a197 --- /dev/null +++ b/dataset/3d_effect_generation_three-view_reference_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nLogical Consistency with Input Reference: 0 points: The output 3D rendering lacks logical consistency with the input three-view sketches, showing significant discrepancies in shape or structure that contradict the references. 1 point: The output 3D rendering logically aligns with the input three-view sketches, maintaining structural coherence and faithfully representing the object's intended form.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nCompleteness of Detail Representation: 0 points: The 3D rendering lacks certain intricate details or specific design elements present in the input sketches, resulting in a simplified or incomplete representation. 1 point: The 3D rendering fully captures all details and intricate features from the input sketches, providing a complete and precise depiction of the object.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nConsistency in Material and Texture Representation: 0 points: The 3D rendering fails to accurately represent materials or textures specified or implied in the input sketches, with inconsistencies that detract from realism. 1 point: The 3D rendering maintains consistent and realistic material and texture representations across all views, effectively enhancing the appearance and believability of the object.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nAdherence to Text Description Instructions: 0 points: The model-generated 3D rendering does not fulfill the specific requirements outlined in the text description, failing to incorporate directed changes or enhancements. 1 point: The model effectively incorporates and adheres to the specific instructions provided in the text description, accurately reflecting all requested modifications or features.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nIntegration of Realistic Lighting and Shadows: 0 points: The 3D rendering shows unrealistic or poorly implemented lighting and shadow effects, with unnatural or inconsistent lighting that detracts from the image’s overall quality. 1 point: The lighting and shadows are realistically integrated into the rendering, creating a natural sense of depth and enhancing the three-dimensional effect in a way that complements the object’s design.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg", "0002.jpg", "0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference three-view pictures for the design task and the bottom image as the response provided by a student. The task objective is to generate a realistic 3D rendering based on the provided three-view pictures and text requirements.\nThe text requirement is:\n\"The picture provided is of a shoe with a cloth upper and a rubber sole. Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item.\"\nYour review question is:\nAesthetic Cohesion and Visual Appeal: 0 points: The 3D rendering lacks visual harmony, with elements that appear disjointed or poorly composed. The design may have awkward proportions, distracting artifacts, or an inconsistent style, resulting in a rendering that feels visually unbalanced or unpolished. 1 point: The 3D rendering demonstrates strong aesthetic cohesion, with a harmonious composition and well-balanced proportions. The style is consistent throughout, and the image has a polished, visually appealing look that aligns with professional design standards.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} diff --git a/dataset/3d_effect_generation_three-view_reference_0002/eval.json b/dataset/3d_effect_generation_three-view_reference_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..35598a543f7a868e702cb2750e1d8944ba626180 --- /dev/null +++ b/dataset/3d_effect_generation_three-view_reference_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Consistency with the input reference image:", + "0_point_standard": "The output 3D rendering lacks logical consistency with the input three-view sketch, showing noticeable discrepancies in shape or structure, deviating from the reference image.", + "1_point_standard": "The output 3D rendering logically aligns with the input three-view sketch, maintaining structural coherence and faithfully representing the intended form of the object." + }, + { + "question": "Completeness of detail representation:", + "0_point_standard": "The 3D rendering misses certain fine details or specific design elements from the input sketch, resulting in a simplified or incomplete representation.", + "1_point_standard": "The 3D rendering fully captures all details and refined features from the input sketch, providing a complete and accurate representation of the object." + }, + { + "question": "Consistency of material and texture representation:", + "0_point_standard": "The 3D rendering fails to accurately depict the materials or textures specified or implied by the input sketch, showing inconsistencies that affect realism.", + "1_point_standard": "The 3D rendering maintains consistent and realistic material and texture representation across all views, effectively enhancing the object's appearance and credibility." + }, + { + "question": "Adherence to text description instructions:", + "0_point_standard": "The 3D rendering generated by the model fails to meet specific requirements from the text description, omitting changes or enhancements mentioned in the instructions.", + "1_point_standard": "The model effectively adheres to and incorporates specific instructions provided in the text description, accurately reflecting all required modifications or features." + }, + { + "question": "Realistic integration of lighting and shadows:", + "0_point_standard": "The lighting and shadow effects in the 3D rendering are unrealistic or poorly handled, with unnatural or inconsistent light sources affecting the overall image quality.", + "1_point_standard": "The lighting and shadow effects are realistically integrated into the rendering, creating a natural sense of depth and enhancing the three-dimensional effect, complementing the object design." + }, + { + "question": "Aesthetic unity and visual appeal:", + "0_point_standard": "The 3D rendering lacks visual harmony, with elements appearing disjointed or poorly combined. The design may show disproportionate elements, distracting artifacts, or inconsistent styles, leading to an image that seems unbalanced or unrefined visually.", + "1_point_standard": "The 3D rendering exhibits strong aesthetic unity, with harmonious combinations and balanced proportions. The style is consistent, and the image has a refined, appealing appearance, meeting professional design standards." + } + ] +} \ No newline at end of file diff --git a/dataset/3d_effect_generation_three-view_reference_0002/images.txt b/dataset/3d_effect_generation_three-view_reference_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..39ae79f428670576d65162b481f185f25d955b5d --- /dev/null +++ b/dataset/3d_effect_generation_three-view_reference_0002/images.txt @@ -0,0 +1,3 @@ +https://img.alicdn.com/imgextra/i4/O1CN01lTEych1PcO6MkHln5_!!6000000001861-0-tps-909-459.jpg +https://img.alicdn.com/imgextra/i3/O1CN015eFw5l1pzuSBLBnzQ_!!6000000005432-0-tps-384-479.jpg +https://img.alicdn.com/imgextra/i2/O1CN01hAdyLY1KIcw1en0V8_!!6000000001141-0-tps-939-387.jpg diff --git a/dataset/3d_effect_generation_three-view_reference_0002/instruction.txt b/dataset/3d_effect_generation_three-view_reference_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f474bf65fce1535814f6ecbd0c8c6453a407ef4f --- /dev/null +++ b/dataset/3d_effect_generation_three-view_reference_0002/instruction.txt @@ -0,0 +1 @@ +The picture provided is of a shoe with a cloth upper and a rubber sole.Please generate a 3D rendering of the item based on the three views (front view, side view, top view) and text description of the item provided. The 3D form of the item should be accurately constructed by combining the information from all angles in the three views and refining the details of the material, colour, light and shadow effects of the item. The final 3D rendering should show the complete three-dimensional appearance of the item, conform to the scale and structure provided in the three-view drawings, and be consistent with the features of the item. \ No newline at end of file diff --git a/dataset/3d_effect_generation_three-view_reference_0002/meta.json b/dataset/3d_effect_generation_three-view_reference_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..6c03112e44abc72794731dfa26f778cd4431e9a9 --- /dev/null +++ b/dataset/3d_effect_generation_three-view_reference_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "3D effect generation with three-view reference", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0073", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_hair_editing_0002/eval.json b/dataset/animal_attribute_editing_hair_editing_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..3b43e2296062ed7d540cd35ab46edd5804382bf3 --- /dev/null +++ b/dataset/animal_attribute_editing_hair_editing_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the edited image accurately retain the unchanged portions of the original animal image, ensuring no unintended alterations were made?", + "0_point_standard": "The edited image shows noticeable changes in areas that should not have been altered.", + "1_point_standard": "The edited image accurately retains the unchanged portions, with no unintended modifications outside the specified areas." + }, + { + "question": "Does the edited image maintain the overall content and identity of the original animal, ensuring consistency between the input and output images?", + "0_point_standard": "The edited image significantly changes the animal's identity or main features, making it look different from the original image.", + "1_point_standard": "The edited image retains the animal's identity and main features, ensuring consistency with the original image." + }, + { + "question": "Does the hair editing meet the specific requirements provided in the text description, such as style, color, or length modifications?", + "0_point_standard": "The hair editing does not meet the specific requirements described in the text, showing noticeable differences in style, color, or length.", + "1_point_standard": "The hair editing accurately meets the specified requirements in the text description." + }, + { + "question": "Does the edited image effectively integrate the hair modifications, providing a seamless and natural appearance consistent with the animal's existing features?", + "0_point_standard": "The hair modifications appear unnatural or poorly integrated, disrupting the overall appearance of the animal.", + "1_point_standard": "The hair modifications are seamlessly integrated, maintaining a natural cohesive appearance with the animal's existing features." + }, + { + "question": "Are the texture and details of the edited hair realistic, enhancing the visual quality of the animal image?", + "0_point_standard": "The texture and details of the edited hair look unrealistic or lack depth, reducing the visual quality of the image.", + "1_point_standard": "The texture and details of the edited hair are realistic, enhancing the overall visual quality of the image." + }, + { + "question": "Does the edited image have a strong aesthetic appeal, providing a visually attractive and professionally executed final result?", + "0_point_standard": "The edited image lacks aesthetic appeal, appearing unprofessional or visually unpleasing.", + "1_point_standard": "The edited image exhibits strong aesthetic appeal, with high visual attractiveness and professional execution." + } + ] +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_hair_editing_0002/images.txt b/dataset/animal_attribute_editing_hair_editing_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..6b082acfc708ed80e75646b6963fe576ac69ddc0 --- /dev/null +++ b/dataset/animal_attribute_editing_hair_editing_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i2/O1CN01Ge9no81W06IqBaGUI_!!6000000002725-0-tps-2480-1550.jpg diff --git a/dataset/animal_attribute_editing_hair_editing_0002/instruction.txt b/dataset/animal_attribute_editing_hair_editing_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3f9cd04c235e7febbbc52e59be4d3c0baff91540 --- /dev/null +++ b/dataset/animal_attribute_editing_hair_editing_0002/instruction.txt @@ -0,0 +1 @@ +Please modify the hair color of the white rabbit in the input image from white to gray. The goal is to keep the rabbit's posture and background unchanged but adjust the hair color to gray, ensuring the texture and lighting effects of the fur remain natural and realistic. The generated image should maintain the overall style of the original, but the fur color should clearly change to gray. \ No newline at end of file diff --git a/dataset/animal_attribute_editing_hair_editing_0002/meta.json b/dataset/animal_attribute_editing_hair_editing_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..bc3ea2818ded2f517d7b6c6b6ed9d4a7d22506fa --- /dev/null +++ b/dataset/animal_attribute_editing_hair_editing_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "animal hair editing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0087", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_posture_editing_0001/eval.json b/dataset/animal_attribute_editing_posture_editing_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..ad081bb9ce5cd956d4ff5d042437192b31e802ad --- /dev/null +++ b/dataset/animal_attribute_editing_posture_editing_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the edited image accurately preserve the original state of unmodified parts of the animal, ensuring consistency with the original image without specified changes?", + "0_point_standard": "There are noticeable changes or inconsistencies in the unmodified parts of the animal compared to the original image.", + "1_point_standard": "The unmodified parts of the animal remain consistent and unchanged, accurately reflecting the original image." + }, + { + "question": "Does the edited image maintain the overall style and characteristics of the animal, ensuring that the edited posture does not alter its recognizable features or species-specific traits?", + "0_point_standard": "The edited posture has altered the animal's recognizable features or species-specific traits, making it difficult to identify.", + "1_point_standard": "The edited posture retains the animal's style and characteristics, maintaining its recognizable features and species-specific traits intact." + }, + { + "question": "Does the edited animal posture accurately reflect the requirements in the text description, effectively conveying the intended posture adjustments?", + "0_point_standard": "The edited posture does not reflect the changes specified in the text description or misrepresents the intended posture.", + "1_point_standard": "The edited posture accurately reflects the changes specified in the text description, conveying the intended posture adjustments." + }, + { + "question": "Does the edited image include the modifications specified in the text without introducing inconsistencies or unrealistic elements in the animal's anatomy and position?", + "0_point_standard": "The modifications have introduced inconsistencies or unrealistic elements in the animal's anatomy and position.", + "1_point_standard": "The modifications are consistent with realistic anatomy and position, without introducing unrealistic elements." + }, + { + "question": "Does the edited image exhibit high-quality texture details, maintaining the natural appearance and texture of the animal's fur, skin, or scales?", + "0_point_standard": "The texture details are poor, leading to an unnatural appearance of the animal's fur, skin, or scales.", + "1_point_standard": "The texture details are of high quality, maintaining the natural appearance of the animal's fur, skin, or scales." + }, + { + "question": "Does the edited image have an overall aesthetic appeal, with visually pleasing composition that meets professional standards and user expectations?", + "0_point_standard": "The edited image lacks aesthetic appeal and has poor visual composition.", + "1_point_standard": "The edited image displays strong aesthetic appeal with high-quality visual composition that meets professional standards." + } + ] +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_posture_editing_0001/images.txt b/dataset/animal_attribute_editing_posture_editing_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..a64aeb114b1d2f22ac9cedc99bc1f7e4176af158 --- /dev/null +++ b/dataset/animal_attribute_editing_posture_editing_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN0116QVew1CSEJUFZfHs_!!6000000000079-0-tps-1279-1706.jpg diff --git a/dataset/animal_attribute_editing_posture_editing_0001/instruction.txt b/dataset/animal_attribute_editing_posture_editing_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..a4135b618c9c374f9e3cc78e67de6fe98ac7e616 --- /dev/null +++ b/dataset/animal_attribute_editing_posture_editing_0001/instruction.txt @@ -0,0 +1 @@ +Please modify the posture of the Corgi in the input image from a sitting position to a standing position. The goal is to keep the dog's facial expression and background unchanged, but adjust its posture to a natural standing position, ensuring that the body proportions and limb positioning are consistent with the physical characteristics of a Corgi. The generated image should look natural and realistic, with a smooth and accurate posture transition. \ No newline at end of file diff --git a/dataset/animal_attribute_editing_posture_editing_0001/meta.json b/dataset/animal_attribute_editing_posture_editing_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..0748552139fd2a9102989a82b6eaa5719e29e2fd --- /dev/null +++ b/dataset/animal_attribute_editing_posture_editing_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "animal posture editing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0088", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_species_editing_0002/eval.json b/dataset/animal_attribute_editing_species_editing_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..19c6dde647e1f33809b97b94c941bf2eca703b90 --- /dev/null +++ b/dataset/animal_attribute_editing_species_editing_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the edited image accurately retain the basic shape and outline of the original animal, except for the specified species modifications?", + "0_point_standard": "There are noticeable deviations or distortions in the animal's shape and outline, apart from the specified species modifications.", + "1_point_standard": "The edited image accurately retains the original animal's shape and outline, except for the specified species modifications." + }, + { + "question": "Aside from the specified modifications, does the edited image maintain the overall structure and proportion of the animal, ensuring consistency between the original and edited images?", + "0_point_standard": "The structure of the animal in the edited image has undergone significant changes in unintended areas, with disproportionate balance.", + "1_point_standard": "The structure and proportion of the animal in the edited image are consistent with the original image (except for the intended modifications), and the proportions are balanced." + }, + { + "question": "Does the edited image accurately reflect the specific species changes described in the text input?", + "0_point_standard": "The species changes are not accurately represented or are inconsistent with the description provided in the text input.", + "1_point_standard": "The species changes are accurately represented and consistent with the description provided in the text input." + }, + { + "question": "Does the edited image correctly implement any additional features specified in the text description, such as colors, patterns, or unique characteristics?", + "0_point_standard": "The edited image fails to include the specified additional features or includes them inaccurately.", + "1_point_standard": "The edited image correctly includes all specified additional features, such as colors, patterns, or unique characteristics." + }, + { + "question": "Does the edited image seamlessly integrate the modified species features with the rest of the image, ensuring a natural appearance?", + "0_point_standard": "The integration of the modified species features is poor, resulting in a disjointed or unnatural appearance.", + "1_point_standard": "The modified species features seamlessly integrate with the rest of the image, presenting a natural and harmonious appearance." + }, + { + "question": "Does the edited image possess overall aesthetic appeal, being visually attractive and meeting professional expectations for digital image editing?", + "0_point_standard": "The edited image lacks aesthetic appeal and has poor visual quality.", + "1_point_standard": "The edited image exhibits strong aesthetic appeal, with high visual attractiveness, meeting professional standards." + } + ] +} \ No newline at end of file diff --git a/dataset/animal_attribute_editing_species_editing_0002/images.txt b/dataset/animal_attribute_editing_species_editing_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e1e66993929767dfaae179c97bf79b0bb08f5c06 --- /dev/null +++ b/dataset/animal_attribute_editing_species_editing_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01dKgsjj1ta2XrUM1gu_!!6000000005917-0-tps-1952-1464.jpg diff --git a/dataset/animal_attribute_editing_species_editing_0002/instruction.txt b/dataset/animal_attribute_editing_species_editing_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..d6cad226f21c39a52d2413e83579355cdb36b85e --- /dev/null +++ b/dataset/animal_attribute_editing_species_editing_0002/instruction.txt @@ -0,0 +1 @@ +Please transform the input image of a bear into a lion. The goal is to keep the bear's posture and background unchanged but edit its species characteristics to resemble a lion. Ensure the edited image features the lion's facial traits, mane, body proportions, and fur characteristics while maintaining the same pose and background as the original image. The resulting image should look natural and realistic, accurately reflecting the lion's species traits. \ No newline at end of file diff --git a/dataset/animal_attribute_editing_species_editing_0002/meta.json b/dataset/animal_attribute_editing_species_editing_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..70c56881a7549b2d72adc0d4f2102b0e4076b6da --- /dev/null +++ b/dataset/animal_attribute_editing_species_editing_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "animal species editing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0089", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/animal_growth_process_generation_with_reference_0001/eval.json b/dataset/animal_growth_process_generation_with_reference_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..eae024783f820f0c2bdeba88adfcd454aaab863d --- /dev/null +++ b/dataset/animal_growth_process_generation_with_reference_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image originate from the input image and maintain a clear association in terms of content and style?", + "0_point_standard": "The output image does not have a clear association with the input image, and its elements do not match the content or style of the original animal.", + "1_point_standard": "The output image clearly originates from the input image, maintains consistent content and style, and retains recognizable features of the original animal." + }, + { + "question": "Does the generated growth process image follow the specified timeline and show reasonable progression?", + "0_point_standard": "The images do not follow a reasonable timeline, with growth stages appearing inconsistent or in a jumbled order.", + "1_point_standard": "The images clearly demonstrate reasonable progression through the animal's growth stages and follow the specified timeline." + }, + { + "question": "Has the model accurately implemented any specific modifications mentioned in the text description, such as changes in size or color?", + "0_point_standard": "The specified modifications have not been accurately implemented, with changes not matching the text description.", + "1_point_standard": "The model has accurately implemented the specified modifications in the text description, such as changes in size or color." + }, + { + "question": "Do the unspecified parts of the image remain unchanged and maintain integrity with the input image?", + "0_point_standard": "Unnecessary alterations or distortions have occurred in parts of the image that should not have changed, affecting overall content.", + "1_point_standard": "The unspecified parts of the image remain unchanged, retaining the integrity and structure of the input image." + }, + { + "question": "Is the image style consistent throughout the whole sequence of growth process images?", + "0_point_standard": "The image style is inconsistent, with variations disrupting the visual coherence of the growth stages.", + "1_point_standard": "The images maintain a consistent style throughout the entire sequence, ensuring visual coherence and continuity." + }, + { + "question": "Does each image in the growth process sequence retain the original animal's key details and recognizable features, ensuring ID consistency?", + "0_point_standard": "The growth process images lack the original animal's key details and recognizable features, making it difficult to identify them as the same animal.", + "1_point_standard": "Each image retains the original animal's key details and recognizable features, ensuring identity consistency throughout the growth process." + } + ] +} \ No newline at end of file diff --git a/dataset/animal_growth_process_generation_with_reference_0001/images.txt b/dataset/animal_growth_process_generation_with_reference_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..298e041a76d231748c7fced7dfe8019bcd373b80 --- /dev/null +++ b/dataset/animal_growth_process_generation_with_reference_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i2/O1CN010Vlkzq1VzduIHyO3P_!!6000000002724-0-tps-1920-1080.jpg diff --git a/dataset/animal_growth_process_generation_with_reference_0001/instruction.txt b/dataset/animal_growth_process_generation_with_reference_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f193a39492ed5627c907e316b1b663386f25ef08 --- /dev/null +++ b/dataset/animal_growth_process_generation_with_reference_0001/instruction.txt @@ -0,0 +1 @@ +Please generate 3 images showing the intermediate growth stages of this adult male lion based on the provided picture. The first image should depict the lion in its cub stage, with shorter fur, a smaller body, and undeveloped facial features. The second image should show the lion in its juvenile stage, where the body is growing larger, the fur is starting to thicken, but the mane is not yet fully developed. The third image should depict the lion in its sub-adult stage, with a body close to adult size, though the mane is still not fully grown. Ensure that all the generated images clearly show the same lion, with visible continuity in the appearance as the lion progresses through different growth stages, making it feel like the same animal at different points in time. \ No newline at end of file diff --git a/dataset/animal_growth_process_generation_with_reference_0001/meta.json b/dataset/animal_growth_process_generation_with_reference_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b72fdb0a14528913a13347624cb2a0df3c23d11e --- /dev/null +++ b/dataset/animal_growth_process_generation_with_reference_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "animal growth process generation with reference", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0048", + "output_image_count": 3, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0002/auto_eval.jsonl b/dataset/brand_merchandise_generation_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..908f7d804ab801912563580785e1d066e541e781 --- /dev/null +++ b/dataset/brand_merchandise_generation_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nBrand Visual Pattern Consistency: 0 points: The brand’s unique visual elements are missing or altered, making it hard to recognize the brand in the merchandise. 1 point: The merchandise clearly reflects every detail of the visual elements in the referenced pattern image, accurately transferring the brand’s pattern and style as specified. The visual pattern in the two images is exactly the same. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nProduct Type Accuracy: 0 points: The generated product does not match the specified type (e.g., a mug instead of a tote bag), or the structure is inconsistent with the description. 1 point: The merchandise accurately aligns with the specified product type, showing the correct form and structure as outlined.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nFidelity to Visual Details: 0 points: Key visual details requested, such as specific color adjustments or logo placements, are missing or incorrectly applied. 1 point: All specified visual details, including color adjustments and logo placement, are accurately and thoughtfully applied as per the description.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nPositioning and Proportion of the Visual Pattern: 0 points: The brand’s visual pattern is positioned awkwardly or disproportionately, failing to integrate naturally with the product’s shape and surface. 1 point: The brand’s visual pattern is applied with correct positioning and proportion, fitting naturally onto the product’s surface and enhancing its visual appeal.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nClarity and Quality of Text or Graphics: 0 points: The text or graphics appear blurry, pixelated, or washed out, reducing the professional quality and clarity of the brand. 1 point: The text and graphics are rendered sharply and vividly, enhancing both readability and the brand’s visual appeal.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Create an image of a white baseball cap featuring a playful, abstract character embroidered on the front. The character should appear as a lively, hand-drawn figure composed of swirling purple lines forming a loose, scribble-like shape. It has simple black line-drawn arms and legs, a wide, open mouth, and small dots for eyes, giving it a fun and expressive look. The character’s face has a joyful, animated expression, with a touch of whimsical hair drawn as short, spiky black lines on top. The cap is displayed on a wooden stand in a well-lit, minimalist studio setup with a soft gray background, lending a professional product photo appearance. The design should have a unique, artistic style that makes the cap feel like part of a distinct and recognizable brand collection.\"\nYour review question is:\nOverall Aesthetic and Professional Quality: 0 points: The merchandise lacks aesthetic cohesion or professionalism, with issues like poor composition, lighting, or unrealistic presentation, making it unfit for brand representation. 1 point: The merchandise displays high aesthetic appeal and professional quality, with balanced composition, effective lighting, and a realistic appearance suitable for showcasing as branded merchandise.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} diff --git a/dataset/brand_merchandise_generation_0002/eval.json b/dataset/brand_merchandise_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b5df889ddf5b836569f63a4a619543ba3460b406 --- /dev/null +++ b/dataset/brand_merchandise_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Brand Visual Pattern Consistency:", + "0_point_standard": "The unique visual elements of the brand are missing or altered, making it difficult to recognize the brand in the product.", + "1_point_standard": "The product clearly reflects every visual element detail from the reference pattern image, accurately conveying the brand's patterns and style. The visual patterns in both images are identical." + }, + { + "question": "Product Type Accuracy:", + "0_point_standard": "The generated product does not match the specified type (e.g., a cup is generated instead of a handbag), or its structure is inconsistent with the description.", + "1_point_standard": "The product accurately matches the specified product type, displaying the correct form and structure described." + }, + { + "question": "Fidelity of Visual Details:", + "0_point_standard": "Missing key visual details required, such as specific color adjustments or logo placement, or applied incorrectly.", + "1_point_standard": "All specified visual details, including color adjustments and logo placement, are applied accurately and thoughtfully according to the description." + }, + { + "question": "Positioning and Proportion of Visual Patterns:", + "0_point_standard": "The brand's visual pattern is improperly positioned or out of proportion, failing to naturally integrate with the product's shape and surface.", + "1_point_standard": "The brand's visual pattern is applied with correct positioning and proportion, naturally adapting to the product's surface and enhancing visual appeal." + }, + { + "question": "Clarity and Quality of Text or Graphics:", + "0_point_standard": "Text or graphics appear blurry, pixelated, or faded, reducing the professional quality and clarity of the brand.", + "1_point_standard": "Text and graphics are presented clearly and vividly, enhancing readability and the brand's visual appeal." + }, + { + "question": "Overall Aesthetic and Professional Quality:", + "0_point_standard": "The product lacks aesthetic unity or professionalism, with issues such as poor composition, insufficient lighting, or unrealistic representation, making it unsuitable for brand presentation.", + "1_point_standard": "The product displays high aesthetic and professional quality, with balanced composition, good lighting effects, and a realistic appearance suitable for brand product presentation." + } + ] +} \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0002/images.txt b/dataset/brand_merchandise_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..9765fe4c5c858c790e31a9e96db4584e93ed9fd5 --- /dev/null +++ b/dataset/brand_merchandise_generation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01O6hbDm27Scki1rVvi_!!6000000007796-0-tps-564-705.jpg diff --git a/dataset/brand_merchandise_generation_0002/instruction.txt b/dataset/brand_merchandise_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..fd2b168005863307b538d188db069bbdaa98fa84 --- /dev/null +++ b/dataset/brand_merchandise_generation_0002/instruction.txt @@ -0,0 +1 @@ +Create an image of a white baseball cap with the purple scribble character embroidered on the front, keeping the same fun and playful expression from the original design. The character may be slightly simplified to fit embroidery details but should retain its unique style. The cap is displayed on a wooden stand in a well-lit studio setting with a soft gray background, creating a professional product photo feel. The design should clearly reflect the original character, making it easily identifiable as part of the same brand. \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0002/meta.json b/dataset/brand_merchandise_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..61fd377a209a7a3dac0eba3a80bb80be24056bb3 --- /dev/null +++ b/dataset/brand_merchandise_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "brand merchandise generation", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0063", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0003/auto_eval.jsonl b/dataset/brand_merchandise_generation_0003/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..cbdbb7f8cbfb3b68ae247c4df550fd1de4385144 --- /dev/null +++ b/dataset/brand_merchandise_generation_0003/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nBrand Visual Pattern Consistency: 0 points: The brand’s unique visual elements are missing or altered, making it hard to recognize the brand in the merchandise. 1 point: The merchandise clearly reflects every detail of the visual elements in the referenced pattern image, accurately transferring the brand’s pattern and style as specified. The visual pattern in the two images is exactly the same. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nProduct Type Accuracy: 0 points: The generated product does not match the specified type (e.g., a mug instead of a tote bag), or the structure is inconsistent with the description. 1 point: The merchandise accurately aligns with the specified product type, showing the correct form and structure as outlined.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nFidelity to Visual Details: 0 points: Key visual details requested, such as specific color adjustments or logo placements, are missing or incorrectly applied. 1 point: All specified visual details, including color adjustments and logo placement, are accurately and thoughtfully applied as per the description.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nPositioning and Proportion of the Visual Pattern: 0 points: The brand’s visual pattern is positioned awkwardly or disproportionately, failing to integrate naturally with the product’s shape and surface. 1 point: The brand’s visual pattern is applied with correct positioning and proportion, fitting naturally onto the product’s surface and enhancing its visual appeal.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nClarity and Quality of Text or Graphics: 0 points: The text or graphics appear blurry, pixelated, or washed out, reducing the professional quality and clarity of the brand. 1 point: The text and graphics are rendered sharply and vividly, enhancing both readability and the brand’s visual appeal.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of two rows of images, with the top row row as the reference brand visual pattern for the design task and the bottom image as the response provided by a student. The task objective is to generate brand peripheral products containing the given brand visual pattern based on the text requirements.\nThe text requirement is:\n\"Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition.\"\nYour review question is:\nOverall Aesthetic and Professional Quality: 0 points: The merchandise lacks aesthetic cohesion or professionalism, with issues like poor composition, lighting, or unrealistic presentation, making it unfit for brand representation. 1 point: The merchandise displays high aesthetic appeal and professional quality, with balanced composition, effective lighting, and a realistic appearance suitable for showcasing as branded merchandise.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}\nReturn: Evaluation"} diff --git a/dataset/brand_merchandise_generation_0003/eval.json b/dataset/brand_merchandise_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..0128a1645af487402edbf3e51f9af1b794672f2a --- /dev/null +++ b/dataset/brand_merchandise_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Brand Visual Pattern Consistency:", + "0_point_standard": "The unique visual elements of the brand are missing or altered, making it difficult to recognize the brand in the product.", + "1_point_standard": "The product clearly reflects every detail of the visual elements in the reference pattern image, accurately conveying the brand's pattern and style. The visual patterns in both images are identical." + }, + { + "question": "Product Type Accuracy:", + "0_point_standard": "The generated product does not match the specified type (e.g., a cup is generated instead of a handbag), or its structure is inconsistent with the description.", + "1_point_standard": "The product accurately matches the specified product type, displaying the correct form and structure as described." + }, + { + "question": "Fidelity of Visual Details:", + "0_point_standard": "Key visual details required, such as specific color adjustments or logo placement, are missing or incorrectly applied.", + "1_point_standard": "All specified visual details, including color adjustments and logo placement, are accurately and thoughtfully applied as described." + }, + { + "question": "Positioning and Proportion of Visual Pattern:", + "0_point_standard": "The brand's visual pattern is improperly positioned or disproportionate, failing to naturally integrate with the product's shape and surface.", + "1_point_standard": "The brand's visual pattern is applied with correct positioning and proportion, naturally adapting to the product's surface, enhancing visual appeal." + }, + { + "question": "Clarity and Quality of Text or Graphics:", + "0_point_standard": "Text or graphics appear blurry, pixelated, or faded, reducing the professional quality and clarity of the brand.", + "1_point_standard": "Text and graphics are presented clearly and vividly, enhancing readability and the brand's visual appeal." + }, + { + "question": "Overall Aesthetic and Professional Quality:", + "0_point_standard": "The product lacks aesthetic unity or professionalism, with poor composition, inadequate lighting, or unrealistic representation, unsuitable for brand display.", + "1_point_standard": "The product exhibits high aesthetic and professional quality, with balanced composition, good lighting effects, and a realistic appearance suitable for brand product display." + } + ] +} \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0003/images.txt b/dataset/brand_merchandise_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..63c6ceaf18361d3bc2692c2ec336a917af5699f0 --- /dev/null +++ b/dataset/brand_merchandise_generation_0003/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01Cks11j1v9auHqr20g_!!6000000006130-0-tps-564-805.jpg diff --git a/dataset/brand_merchandise_generation_0003/instruction.txt b/dataset/brand_merchandise_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3701396e32a37a2cec38907ce199d8b589a6b5b2 --- /dev/null +++ b/dataset/brand_merchandise_generation_0003/instruction.txt @@ -0,0 +1 @@ +Generate an image of a large, eco-friendly water bottle with the colorful noodle graphic wrapping around the bottom third of the bottle. The design should remain vibrant and closely match the original image’s colors and playful style. The bottle is placed on a marble countertop with a few fresh vegetables like tomatoes and basil beside it, suggesting a healthy, eco-conscious theme. The background is a bright kitchen scene. The design should make it clear that this water bottle is part of the same brand as the original noodle graphic, ensuring brand recognition. \ No newline at end of file diff --git a/dataset/brand_merchandise_generation_0003/meta.json b/dataset/brand_merchandise_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..f01b3ed8deb24f4c6939f3b42a7427713426b644 --- /dev/null +++ b/dataset/brand_merchandise_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "brand merchandise generation", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0063", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/business_card_generation_0002/.DS_Store b/dataset/business_card_generation_0002/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 Binary files /dev/null and b/dataset/business_card_generation_0002/.DS_Store differ diff --git a/dataset/business_card_generation_0002/eval.json b/dataset/business_card_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..1dbc40a3566aa7ad348d5caa00a82ed72450c131 --- /dev/null +++ b/dataset/business_card_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the business card design match the textual description and include all key information (e.g., name, position, contact details)?", + "0_point_standard": "The business card design does not match the description, with missing or incorrect key information.", + "1_point_standard": "The business card design matches the description, accurately displaying all key information." + }, + { + "question": "Is the text on the business card clear and easy to read, and do the font style and layout meet design requirements?", + "0_point_standard": "The text is unclear, and the font style or layout does not meet requirements, affecting overall readability.", + "1_point_standard": "The text is clear and easy to read, and the font style and layout meet design requirements." + }, + { + "question": "Does the overall color scheme and visual style of the business card align with the style requirements described in the text (e.g., simple, modern)?", + "0_point_standard": "The color scheme and visual style do not match the textual description and fail to convey the intended style.", + "1_point_standard": "The color scheme and visual style align with the textual description, conveying the intended design style." + }, + { + "question": "Has the model accurately implemented the special design requirements mentioned in the text (e.g., logo, icon, or background pattern)?", + "0_point_standard": "The special design requirements mentioned in the text have not been accurately implemented, or lack detail.", + "1_point_standard": "The special design requirements mentioned in the text are accurately implemented, with precise details." + }, + { + "question": "Is the layout of the business card clear and logical, with reasonably organized information that is easy to understand?", + "0_point_standard": "The layout is chaotic, with poorly organized information and a cluttered visual effect.", + "1_point_standard": "The layout is clear and logical, with reasonably organized information that is easy to understand and read." + }, + { + "question": "Does the overall aesthetic and design quality of the business card meet professional standards, and does it have strong visual appeal?", + "0_point_standard": "The overall aesthetic of the business card is lacking, with weak design sense and insufficient visual appeal.", + "1_point_standard": "The business card has high aesthetic quality, a strong design sense, and good visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/business_card_generation_0002/images.txt b/dataset/business_card_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/business_card_generation_0002/instruction.txt b/dataset/business_card_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..bb130a834241bfbe4c81251a8ae91701c594159d --- /dev/null +++ b/dataset/business_card_generation_0002/instruction.txt @@ -0,0 +1 @@ +This business card design has a nautical, seafood-themed aesthetic with a vintage-style illustration of octopus tentacles in dark blue on a cream background. The front side of the card features intricate tentacle sketches that curl and fill the space in an organic pattern. In the center, there is a red-bordered rectangle containing the word “RESTAURANT” in bold, uppercase red letters. Beneath this, the text “Sea Food” appears in smaller, black, uppercase letters, indicating the restaurant’s specialty. The tentacle illustrations wrap around the text box, creating a dynamic frame. On the back side, the tentacle pattern continues around the edges, forming a decorative border. The central area is left blank for contact information, which is displayed in a clean, minimalist layout. The top left features a phone icon followed by “+0123 456 7890,” an email icon followed by “Seafood@Restaurant.com,” and a location icon followed by “12 Harbour Lane, Restaurant, AB1 2CD.” A red line rectangle frames the contact information section, mirroring the front side’s design. The overall style is elegant, combining a marine theme with classic design elements, making it ideal for a high-end seafood restaurant. At the bottom, small text reads “designed by freepik,” indicating the design source. The only generated image contains both sides of the business card. \ No newline at end of file diff --git a/dataset/business_card_generation_0002/meta.json b/dataset/business_card_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..8cc065143ebd0a9c1caf8869038ab3909ef575a7 --- /dev/null +++ b/dataset/business_card_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "business card generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0032", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/business_card_generation_0003/eval.json b/dataset/business_card_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..348404b22a2e73aafc48b243aaf465352bc0872c --- /dev/null +++ b/dataset/business_card_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the business card design match the text description and include all key information (e.g., name, position, contact information)?", + "0_point_standard": "The business card design does not match the description, key information is missing or displayed incorrectly.", + "1_point_standard": "The business card design matches the description and accurately displays all key information." + }, + { + "question": "Is the text on the business card clear and easy to read, and do the font style and layout meet the design requirements?", + "0_point_standard": "The text is unclear, the font style or layout does not meet the requirements, affecting overall readability.", + "1_point_standard": "The text is clear and easy to read, and the font style and layout meet the design requirements." + }, + { + "question": "Do the overall color scheme and visual style of the business card align with the style requirements described in the text (e.g., minimalist, modern)?", + "0_point_standard": "The color scheme and visual style do not match the text description and fail to convey the intended style.", + "1_point_standard": "The color scheme and visual style match the text description and convey the intended design style." + }, + { + "question": "Has the model accurately implemented the special design requirements mentioned in the text (e.g., logos, icons, or background patterns)?", + "0_point_standard": "The special design requirements mentioned in the text have not been accurately implemented, or details are insufficient.", + "1_point_standard": "The special design requirements mentioned in the text are accurately implemented with precise details." + }, + { + "question": "Is the layout of the business card clear and logical, and is the information organized reasonably and easy to understand?", + "0_point_standard": "The layout is chaotic, information is poorly organized, and the visual effect is cluttered.", + "1_point_standard": "The layout is clear and logical, information is organized reasonably, and it is easy to understand and read." + }, + { + "question": "Does the overall aesthetics and design quality of the business card meet professional standards, and does it have strong visual appeal?", + "0_point_standard": "The overall aesthetics of the business card are insufficient, the design is weak, and it lacks visual appeal.", + "1_point_standard": "The business card has high aesthetics, strong design sense, and good visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/business_card_generation_0003/images.txt b/dataset/business_card_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/business_card_generation_0003/instruction.txt b/dataset/business_card_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..cb1902d50b4d445d042cd3900f4d84ec1202dac2 --- /dev/null +++ b/dataset/business_card_generation_0003/instruction.txt @@ -0,0 +1 @@ +This business card design has a delicate, botanical theme in grayscale. The front side has a white background with detailed, hand-drawn floral illustrations in soft gray that cover the entire surface, giving it a vintage, elegant feel. Centered on this side is a rectangular frame with thin borders and decorative corners, within which the name “Emma Williams” is written in a refined serif font. This frame adds a touch of sophistication, highlighting the name as the main focal point. On the back side, the design is split, with floral illustrations continuing on the right side, while the left side is a blank white section where the contact information is displayed. The name “Emma Williams” appears at the top in the same serif font, followed by the title “(Graphic Designer)” in smaller, italicized text beneath it. Below, three contact details are listed with bullet points: “instagram,” “telegram,” and a website URL “www.emmawilliams.com,” each in lowercase letters and simple fonts. The overall design is clean, stylish, and professional, suitable for a graphic designer who values minimalism and classic aesthetics. The only generated image contains both sides of the business card. \ No newline at end of file diff --git a/dataset/business_card_generation_0003/meta.json b/dataset/business_card_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..2b5333669f169d789d925c300feea4858d4e90ca --- /dev/null +++ b/dataset/business_card_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "business card generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0032", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_0003/eval.json b/dataset/childrens_book_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4d8737012633145584cc9019bc049dea86b08578 --- /dev/null +++ b/dataset/childrens_book_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sequence of images present a coherent narrative in a logical order?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks logical flow, disrupting the narrative.", + "1_point_standard": "The sequence of images clearly presents a coherent narrative in a logical chronological order." + }, + { + "question": "Does the content of the images match the text descriptions provided in the children's book?", + "0_point_standard": "The images do not accurately reflect the text descriptions and deviate significantly from the story.", + "1_point_standard": "The images perfectly match the text descriptions, accurately depicting the story elements." + }, + { + "question": "Is the illustration style consistent throughout the entire book?", + "0_point_standard": "The illustration style is inconsistent, leading to a disjointed visual experience.", + "1_point_standard": "All illustrations maintain a consistent style, creating a harmonious visual effect throughout the book." + }, + { + "question": "Is the depiction of main characters or objects consistent across all illustrations?", + "0_point_standard": "The depiction of main characters or objects is inconsistent across different images, making them difficult to recognize as the same characters or objects.", + "1_point_standard": "The depiction of main characters or objects is consistent, clearly recognizable as the same characters or objects across all illustrations." + }, + { + "question": "Is the portrayal of narrative and characters logically accurate and suitable for the children's age group?", + "0_point_standard": "The portrayal is illogical, inaccurate, or unsuitable for the target age group, with noticeable errors.", + "1_point_standard": "The portrayal is logically accurate, suitable for the target age group, and reflects the expected narrative standards." + }, + { + "question": "Do the illustrations exhibit a professional level of detail and aesthetic appeal, enhancing the book's visual attraction?", + "0_point_standard": "The illustrations lack detail and aesthetic appeal, falling short of the visual standards for children's books.", + "1_point_standard": "The illustrations are rich in detail and have excellent aesthetic appeal, meeting professional standards and significantly enhancing visual attraction." + } + ] +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_0003/images.txt b/dataset/childrens_book_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/childrens_book_generation_0003/instruction.txt b/dataset/childrens_book_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..4224d0e09b27fe55999d86d02fe0020aabae12be --- /dev/null +++ b/dataset/childrens_book_generation_0003/instruction.txt @@ -0,0 +1 @@ +This is a children's picture book illustration generation task consisting of 8 pages, titled “The Talking Cloud.” Scene and character IDs need to remain consistent throughout the book to ensure stylistic uniformity and character continuity. The main characters include a little girl named Amy (ID: Amy) and a talking cloud (ID: Cloudy). Each page will depict the daily life of children in different countries. Page 1: The story begins on a clear afternoon with Amy playing alone in the park. She looks up at the sky and suddenly notices a cloud drifting down. To her surprise, the cloud starts talking and greets her warmly, inviting her to explore the world together. The scene is set in an open park, with children playing in the distance and a few fluffy clouds in the sky. Amy stands on the grass, looking up at the cloud, which is smiling as it approaches her from the sky. Character IDs: Amy, Cloudy. Page 2: The cloud takes Amy up into the sky, and they arrive at an African savanna, where they see a girl named Kara (ID: Kara) helping her family water the cattle. Amy is amazed to see how Kara and her family rely on water to care for their animals. The scene is set in a vast savanna, with a few large trees in the distance. Kara is by a river, drawing water for the cattle, and in the background, other animals like giraffes and zebras can be seen. Character IDs: Amy, Cloudy, Kara. Page 3: Next, Amy and the cloud fly to Japan, where they see a boy named Kenta (ID: Kenta) preparing to participate in his school's sports day with his friends. The cloud explains to Amy that sports day is an important event in Japanese schools, where students show their teamwork through competitions. The scene is set on a Japanese school playground, with flags flying overhead. Kenta and his friends are dressed in sports uniforms, ready for a race, and the background features the school buildings and cherry blossom trees around the field. Character IDs: Amy, Cloudy, Kenta. Page 4: Then, the cloud takes Amy to Paris, France, where they see a girl named Sophie (ID: Sophie) enjoying lunch with her parents. The cloud tells Amy that French people love to enjoy meals outdoors, especially on warm days. The scene is set on a Parisian street café, with the Eiffel Tower faintly visible in the background. Sophie and her family are seated at a table, enjoying bread and cheese, with a warm and relaxed atmosphere. Character IDs: Amy, Cloudy, Sophie. Page 5: Amy and the cloud continue flying and arrive in a snowy landscape, where they meet a boy named Ivan (ID: Ivan), who is building a large snowman with his family. The cloud tells Amy that Ivan lives in Russia, where they enjoy playing in the snow during the winter. The scene is set in a snowy Russian village, with Ivan and his family building a snowman. In the background, wooden cottages and pine trees are covered in snow, and snowflakes are gently falling. Character IDs: Amy, Cloudy, Ivan. Page 6: Afterward, Amy and the cloud fly to India, where they see a girl named Anya (ID: Anya) creating colorful patterns outside her house. The cloud tells Amy that this is called “Rangoli,” a traditional Indian decoration that symbolizes good luck and happiness. The scene is set in the courtyard of an Indian home, with Anya drawing beautiful patterns on the ground with colored powders. In the background, the house is decorated with lanterns and flower garlands. Character IDs: Amy, Cloudy, Anya. Page 7: Finally, Amy and the cloud return to her hometown, where the sky is painted with the colors of the setting sun. Amy thanks the cloud for taking her on a journey around the world and learning about the lives of children from different cultures. The cloud smiles and slowly disappears into the sky, leaving Amy sitting on the grass, reminiscing about her wonderful adventure. The scene is set in the park at sunset, with the golden glow of the sun lighting up the sky. Trees sway gently in the breeze, and Amy smiles as she watches the cloud fade away. Character IDs: Amy, Cloudy. Page 8: In the final page, Amy is back at home, and she takes out her drawing tools to illustrate her adventure with the cloud. Her walls are covered with pictures of the friends she met around the world, and she decides to keep this journey in her heart forever. The scene is set in Amy's bedroom, with pictures of the children she met on her travels decorating the walls. On her desk, she is working on a colorful drawing, and the room is filled with a warm, cozy atmosphere. Character IDs: Amy. \ No newline at end of file diff --git a/dataset/childrens_book_generation_0003/meta.json b/dataset/childrens_book_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..d6f54268bd666db0acd9c51bd805159d0eeb2c0b --- /dev/null +++ b/dataset/childrens_book_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "childrens book generation without reference", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0019", + "output_image_count": 8, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_0004/eval.json b/dataset/childrens_book_generation_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..1175f8a49fab8a3a20242bdb69d4c21437ecf27a --- /dev/null +++ b/dataset/childrens_book_generation_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sequence of images present a coherent narrative in logical order?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks logical flow, disrupting the narrative.", + "1_point_standard": "The sequence of images clearly presents a coherent narrative in logical chronological order." + }, + { + "question": "Do the image contents match the textual descriptions provided in the children's book?", + "0_point_standard": "The images do not accurately reflect the textual descriptions, with noticeable deviations from the story.", + "1_point_standard": "The images completely match the textual descriptions, accurately depicting the story elements." + }, + { + "question": "Is the illustration style consistent throughout the book?", + "0_point_standard": "The illustration style is inconsistent, leading to a disjointed visual effect.", + "1_point_standard": "All illustrations maintain a consistent style, creating a harmonious visual effect throughout the book." + }, + { + "question": "Are the images of main characters or objects consistent across all illustrations?", + "0_point_standard": "The images of main characters or objects are inconsistent across different images, making it difficult to recognize them as the same character or object.", + "1_point_standard": "The images of main characters or objects are consistent, clearly recognizable as the same character or object in all illustrations." + }, + { + "question": "Is the depiction of narrative and characters logically accurate and suitable for the children's age group?", + "0_point_standard": "The depiction is illogical, inaccurate, or unsuitable for the target age group, with evident errors.", + "1_point_standard": "The depiction is logically accurate, suitable for the target age group, and reflects the expected narrative standards." + }, + { + "question": "Do the illustrations exhibit professional-level detail and aesthetic, enhancing the book's visual appeal?", + "0_point_standard": "The illustrations lack detail, have poor aesthetic quality, and do not meet the visual standards of children's books.", + "1_point_standard": "The illustrations are rich in detail and have excellent aesthetic quality, meeting professional standards with significant visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_0004/images.txt b/dataset/childrens_book_generation_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/childrens_book_generation_0004/instruction.txt b/dataset/childrens_book_generation_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..af36f5cef662f5fe8e546f53902f530e5cb62893 --- /dev/null +++ b/dataset/childrens_book_generation_0004/instruction.txt @@ -0,0 +1 @@ +This is a children's picture book illustration generation task consisting of 7 pages, titled “The Secret Garden of the Moon.” Scene and character IDs need to remain consistent throughout the book to ensure stylistic uniformity and character continuity. The main characters include the little protagonist Luna (ID: Luna) and the mysterious guardian Lucy (ID: Lucy) from the Moon. The following are detailed descriptions for each page: Page 1: The story begins at night on Earth, where Luna is lying in bed, gazing at the full moon through her window. She notices what seems to be a garden on the Moon, with flowers softly glowing. Suddenly, a gentle light appears in her room, and a mysterious guardian named Lucy steps in, inviting Luna to visit the secret garden on the Moon. The scene is set in Luna's bedroom, with moonlight streaming through the window, and Luna sitting on her bed, watching Lucy, who is glowing softly as she enters. Character IDs: Luna, Lucy. Page 2: Luna follows Lucy into the night sky. They fly through the sparkling stars and wispy clouds, finally arriving at the garden on the Moon. The garden is bathed in silvery moonlight, and the flowers shimmer like jewels, each glowing with different colors. The scene is set on the surface of the Moon, surrounded by a dreamy night sky, with stars all around, and the garden filled with glowing flowers. Character IDs: Luna, Lucy. Page 3: Luna curiously steps into the garden, and each flower emits a unique colored light. Lucy explains that the flowers are not only beautiful but also magical; every night, their light can guide people through the sky into different dreamlands. The scene is set in the center of the garden, where the flowers glow in all colors, and Luna and Lucy stand among them as Lucy softly reveals the secrets of the flowers. Character IDs: Luna, Lucy. Page 4: Lucy takes Luna to a glowing blue flower, gently touches its petals, and in an instant, Luna is surrounded by light, flying into a magical city made of stars. In this city, stars dance, and the buildings seem to be made of stardust, twinkling brightly. The scene is set in the starry city, with Luna floating among the stars, surrounded by stardust buildings and dancing stars. Character IDs: Luna, Lucy. Page 5: Next, they visit a garden made of clouds, where the flowers are soft and misty, glowing with a gentle purple light. Luna sits in the cloud garden, feeling a peaceful energy wash over her, as she closes her eyes and listens to the beautiful music drifting through the clouds. The scene is set in the cloud garden, with soft clouds surrounding Luna, and the flowers glowing softly, illuminating the night sky, with other floating clouds visible in the distance. Character IDs: Luna, Lucy. Page 6: Finally, Lucy brings Luna back to the secret garden on the Moon and tells her that every flower holds a different story and power. Luna feels reluctant to leave but knows she can return anytime. The scene is set at the entrance to the garden, where Luna and Lucy stand at the edge, and the garden glows softly behind them, with Earth and the stars in the distance. Character IDs: Luna, Lucy. Page 7: In the final page, Luna is back in her room, with the Moon still hanging in the sky. She smiles as she looks out the window, her heart filled with excitement for her next adventure in the Moon's garden. The scene is set in Luna's bedroom, with the moonlight spilling onto her bed. Luna closes her eyes, ready to drift into a sweet dream. Character IDs: Luna. \ No newline at end of file diff --git a/dataset/childrens_book_generation_0004/meta.json b/dataset/childrens_book_generation_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..abd868ef0497e3c73142fb16e02155796d809908 --- /dev/null +++ b/dataset/childrens_book_generation_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "childrens book generation without reference", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0019", + "output_image_count": 7, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_role_definition_0001/auto_eval.jsonl b/dataset/childrens_book_generation_role_definition_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..eb54cbb153245563cff0503c26f64ef6bea84038 --- /dev/null +++ b/dataset/childrens_book_generation_role_definition_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg", "0005.jpg"], "question": "Is the number in the image the digit 5? 0 points: The number in the image is not the digit 5; 1 point: The number in the image is the digit 5. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0005.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first input image and fifth output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided character definitions and text descriptions. \nThe text requirement is:\nPlease generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together.\nYour review question is:\nDoes the first character in the Page 5 output image accurately reflect the features, attire, and style of the given character reference from Page 1? 0 points: The first character in the output image significantly deviates from the reference, with noticeable inconsistencies in features, clothing, or style. 1 point: The first character in the output image closely matches the input character reference, maintaining consistency in features, clothing, and overall style.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0004.jpg"], "output_images": ["0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the fourth input image and fourth output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided character definitions and text descriptions. \nThe text requirement is:\nPlease generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together.\nYour review question is:\nDoes the fourth character in the Page 4 output image accurately reflect the features, attire, and style of the given character reference? 0 points: The fourth character in the output image significantly deviates from the reference, with noticeable inconsistencies in features, clothing, or style. 1 point: The fourth character in the output image closely matches the input character reference, maintaining consistency in features, clothing, and overall style.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and second output images of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided character definitions and text descriptions. \nThe text requirement is:\nPlease generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together.\nYour review question is:\nDoes the generated image accurately depict the specific scene described in the text (e.g., Page 1’s city street, Page 2’s park setting)? 0 points: The scene elements, such as background, setting, or key objects, do not match the text description, making it difficult to identify the specified environment. 1 point: The scene accurately reflects the details provided in the text, with appropriate background elements and settings that clearly represent the described environment.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0005.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fifth output images of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided character definitions and text descriptions. \nThe text requirement is:\nPlease generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together.\nYour review question is:\nIs the illustration style consistent across different pages, maintaining a cohesive visual theme throughout the story? 0 points: The illustration style varies noticeably between images, with differences in color tones, line thickness, or shading that disrupt the cohesion of the story. 1 point: The illustration style is consistent across images, with cohesive colors, shading, and artistic elements that unify the visual presentation.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first output image and original input image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided character definitions and text descriptions. \nThe text requirement is:\nPlease generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together.\nYour review question is:\nAre the interactions and expressions of the characters accurately portrayed, matching the actions and emotions described in the text? 0 points: The characters’ interactions or expressions do not match the actions or emotions described in the text, making the scene appear disconnected from the narrative. 1 point: The characters’ interactions and expressions align with the text, effectively conveying the intended actions and emotions, such as smiles, waves, or focus.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/childrens_book_generation_role_definition_0001/eval.json b/dataset/childrens_book_generation_role_definition_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..397434908999f473780251260d0af79377264e2b --- /dev/null +++ b/dataset/childrens_book_generation_role_definition_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements described in the text?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Does the first character in the output image on Page 5 accurately reflect the features, clothing, and style of the character reference provided on Page 1?", + "0_point_standard": "There is a significant deviation in the first character of the output image from the reference image, with noticeable inconsistencies in features, clothing, or style.", + "1_point_standard": "The first character in the output image closely matches the input character reference, maintaining consistency in features, clothing, and overall style." + }, + { + "question": "Does the fourth character in the output image on Page 4 accurately reflect the features, clothing, and style of the given character reference?", + "0_point_standard": "There is a significant deviation in the fourth character of the output image from the reference image, with noticeable inconsistencies in features, clothing, or style.", + "1_point_standard": "The fourth character in the output image closely matches the input character reference, maintaining consistency in features, clothing, and overall style." + }, + { + "question": "Does the generated image accurately depict the specific scene described in the text (e.g., city street on Page 1, park scene on Page 2)?", + "0_point_standard": "Scene elements (such as background, environment, or key objects) do not match the text description, making it difficult to recognize the specified environment.", + "1_point_standard": "The scene accurately reflects the details provided in the text, with appropriate background elements and settings, clearly representing the described environment." + }, + { + "question": "Is the illustration style consistent across different pages, maintaining a coherent visual theme for the story?", + "0_point_standard": "There are noticeable differences in illustration style between images, such as tone, line thickness, or shading, disrupting the story's coherence.", + "1_point_standard": "The illustration style is consistent across the images, with coherent colors, shading, and artistic elements, providing a unified visual presentation." + }, + { + "question": "Do the interactions and expressions of the characters accurately depict the actions and emotions described in the text?", + "0_point_standard": "The interactions or expressions of the characters do not match the actions or emotions described in the text, making the scene appear disconnected from the narrative.", + "1_point_standard": "The interactions and expressions of the characters are consistent with the text, effectively conveying the intended actions and emotions, such as smiling, waving, or focusing." + } + ] +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_role_definition_0001/images.txt b/dataset/childrens_book_generation_role_definition_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..0cde537648eb8de7a0f0d3b4855f16f2de388636 --- /dev/null +++ b/dataset/childrens_book_generation_role_definition_0001/images.txt @@ -0,0 +1,4 @@ +https://img.alicdn.com/imgextra/i4/O1CN01VV9euz1I1Ywjjs48J_!!6000000000833-0-tps-1920-1920.jpg +https://img.alicdn.com/imgextra/i3/O1CN01MQZSNJ1uDt0E0rq4t_!!6000000006004-0-tps-1920-1920.jpg +https://img.alicdn.com/imgextra/i3/O1CN01VZbznT1vCLHduzktl_!!6000000006136-0-tps-1920-1920.jpg +https://img.alicdn.com/imgextra/i2/O1CN011LfxIL1IWhsWyKdXQ_!!6000000000901-0-tps-1920-1920.jpg diff --git a/dataset/childrens_book_generation_role_definition_0001/instruction.txt b/dataset/childrens_book_generation_role_definition_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..746dad45d408ea82df571cc0506d448b4a1bf7f1 --- /dev/null +++ b/dataset/childrens_book_generation_role_definition_0001/instruction.txt @@ -0,0 +1 @@ +Please generate images based on the description of each page, ensuring that all images reflect the story's narrative and maintain the consistency of the characters, with all characters matching the given definitions of the four characters. Page 1: Lina (character 1) is happily walking down a bright city street. She's wearing a green outfit, looking cheerful, with her hand raised slightly as if waving to the world. The background features a warm orange sky, with simple modern buildings on both sides of the street, and light-colored brick paving. The entire city exudes energy and vibrancy. The model should reflect Lina's lively personality, with the bright tones of the street complementing the city's energetic atmosphere. Page 2: Lina meets Ajay (character 2), a boy with blue hair and a floral shirt, at the corner of the street. They stand in a candy-colored park, surrounded by dense trees and benches, with colorful flowers dotting the grass. Sunlight filters through the leaves, casting light on them as Ajay shares some recent fun stories, and Lina listens with a bright smile. The model should depict their friendly interaction, with the warm sunlight and natural background enhancing the harmony of the scene. Page 3: As they walk further, they meet Kyle (character 3), a cool boy wearing gray sportswear and sunglasses, performing a high-energy skateboarding trick in the skate park. Kyle is airborne, his skateboard spinning beneath him. The background is a modern skate park, with a dark backdrop highlighting his impressive moves. The model should capture Kyle's dynamic action in the air, emphasizing his personality and the movement of the skateboard. Page 4: Lina, Ajay, and Kyle arrive at the beach, where they meet Xiaohai (character 4), a lively boy holding a surfboard, getting ready to surf. He's wearing a T-shirt with eye patterns, with the backdrop of a vast ocean. The blue sky and white waves meet at the horizon, with the water gently lapping at the shore. The model should convey the freshness of the beach and Xiaohai's excitement, showcasing his anticipation for surfing. Page 5: The four friends play together on the beach under the setting sun, holding hands and forming a circle as laughter fills the air. The background is the golden sunlight reflecting off the sea, with waves gently hitting the shore. The entire scene radiates warmth and the bond of friendship. The model should use the warm glow of the sunset and the friends' smiles to convey the joy and beauty of their time together. \ No newline at end of file diff --git a/dataset/childrens_book_generation_role_definition_0001/meta.json b/dataset/childrens_book_generation_role_definition_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..37e98d7fd4243b6360029fbe61186d63238d082e --- /dev/null +++ b/dataset/childrens_book_generation_role_definition_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "childrens book generation with role definition", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0044", + "output_image_count": 5, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_scenario_definition_0002/auto_eval.jsonl b/dataset/childrens_book_generation_scenario_definition_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..e411edd1fd10d650fbc3e8216f009ae946861efe --- /dev/null +++ b/dataset/childrens_book_generation_scenario_definition_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg", "0005.jpg", "0006.jpg", "0007.jpg"], "question": "Is the number in the image the digit 7? 0 points: The number in the image is not the digit 7; 1 point: The number in the image is the digit 7. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and first output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided scene definitions and text descriptions. \nThe text requirement is:\nPlease generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful.\nYour review question is:\nDoes the scene on Page 1 accurately represent the living room as described, with the sunlight streaming through the window and a cozy setup with two animal friends, the rabbit and the squirrel, sitting by the sofa? 0 points: The scene lacks key elements such as sunlight streaming through the window or the presence of both animal characters, failing to represent the described cozy atmosphere. 1 point: The scene includes all essential elements, including sunlight, both animal characters, and the cozy atmosphere with the living room setup as described.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and first output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided scene definitions and text descriptions. \nThe text requirement is:\nPlease generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful.\nYour review question is:\nDoes the output on Page 2 depict the rabbit and squirrel packing for their journey in alignment with the entranceway setup, including items like the shoe rack, boots, and umbrellas? 0 points: The image lacks key elements such as the shoe rack, umbrellas, or other specified items, or does not accurately show the characters preparing for their adventure. 1 point: The image faithfully represents the entranceway with all specified items, showing the characters actively preparing for their adventure, as described.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and first output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided scene definitions and text descriptions. \nThe text requirement is:\nPlease generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful.\nYour review question is:\nDoes the cabin scene on Page 3 (stormy, abandoned setting) effectively transition to a more warm and cozy environment on Page 4 as the sunlight filters in? 0 points: The cabin scene on Page 4 does not reflect the described transformation from a dark, abandoned environment to a warm and cozy setting with sunlight. 1 point: The cabin scene on Page 4 clearly transitions to a cozy atmosphere, with sunlight filtering in as described, showing the change from Page 3.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0006.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and first output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided scene definitions and text descriptions. \nThe text requirement is:\nPlease generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful.\nYour review question is:\nDo the scenes on Page 6 and the original input scene of the living room maintain consistency in style and overall atmosphere? 0 points: The output image on Page 6 displays significant inconsistencies in style, color, or atmosphere compared to the original living room scene. 1 point: The output image on Page 6 maintains a consistent style and atmosphere, accurately representing the living room as per the original scene definition.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0006.jpg", "0007.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and first output image of the response provided by a student. The task objective is to generate a children's storybook with a sequence of images based on the provided scene definitions and text descriptions. \nThe text requirement is:\nPlease generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful.\nYour review question is:\nDoes the image on Page 7 accurately depict the nighttime setting with soft lamp lighting, stars outside, and a cozy atmosphere that allows the characters to gaze at the night sky? 0 points: The nighttime scene lacks key elements such as soft lighting, visible stars outside, or a cozy atmosphere that fits the end-of-day setting. 1 point: The nighttime scene includes soft lighting, stars visible outside, and a cozy ambiance that accurately captures the end-of-day mood as described.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/childrens_book_generation_scenario_definition_0002/eval.json b/dataset/childrens_book_generation_scenario_definition_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..34183616058e470bc9651b944e544c8e0d7fafa4 --- /dev/null +++ b/dataset/childrens_book_generation_scenario_definition_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements of the text description?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Does the scene on Page 1 accurately depict the described living room, with sunlight streaming through the window, and two animal friends, the rabbit and the squirrel, sitting by the sofa, creating a cozy atmosphere?", + "0_point_standard": "The scene lacks key elements such as sunlight through the window or the presence of the two animal characters, failing to convey the cozy atmosphere described.", + "1_point_standard": "The scene includes all necessary elements, such as sunlight, the two animal characters, and a cozy living room setup, matching the description." + }, + { + "question": "Does the output on Page 2 depict the scene of the rabbit and the squirrel preparing for a journey at the entrance, including items like a shoe rack, boots, and an umbrella?", + "0_point_standard": "The image lacks key elements such as a shoe rack, umbrella, or other specified items, or fails to accurately depict the characters preparing for an adventure.", + "1_point_standard": "The image faithfully recreates the entrance scene with all specified items, portraying the characters actively preparing for an adventure as described." + }, + { + "question": "Does the cabin scene on Page 3 (stormy abandoned environment) effectively transition to a warmer, cozier environment with sunlight streaming in on Page 4?", + "0_point_standard": "The cabin scene on Page 4 fails to showcase the transition from a dark, abandoned environment to a warm and cozy one, lacking sunlight.", + "1_point_standard": "The cabin scene on Page 4 clearly transitions to a cozy atmosphere, with sunlight streaming in as described, showing the change from Page 3." + }, + { + "question": "Does the scene on Page 6 maintain consistency in style and overall atmosphere with the initial living room input scene?", + "0_point_standard": "The output image on Page 6 has significant inconsistencies in style, color, or atmosphere compared to the original living room scene.", + "1_point_standard": "The output image on Page 6 maintains a consistent style and atmosphere, accurately reflecting the original living room scene definition." + }, + { + "question": "Does the image on Page 7 accurately depict a nighttime scene with soft lighting, stars outside the window, and a cozy atmosphere for the characters to gaze at the night sky?", + "0_point_standard": "The nighttime scene lacks key elements like soft lighting, stars outside the window, or a cozy atmosphere suitable for the end of the day.", + "1_point_standard": "The nighttime scene includes soft lighting, stars outside the window, and a cozy atmosphere, accurately capturing the described nighttime setting." + } + ] +} \ No newline at end of file diff --git a/dataset/childrens_book_generation_scenario_definition_0002/images.txt b/dataset/childrens_book_generation_scenario_definition_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..5853f329914d0522acf3e3a67d0e92425eefd27b --- /dev/null +++ b/dataset/childrens_book_generation_scenario_definition_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01An95h71sXv61qyU2H_!!6000000005777-0-tps-1280-720.jpg +https://img.alicdn.com/imgextra/i1/O1CN01ejuS0z20i0kTe9meu_!!6000000006882-0-tps-1280-720.jpg diff --git a/dataset/childrens_book_generation_scenario_definition_0002/instruction.txt b/dataset/childrens_book_generation_scenario_definition_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..dd9ead52bbe47dc8cac4b9a33e1c82bdac600194 --- /dev/null +++ b/dataset/childrens_book_generation_scenario_definition_0002/instruction.txt @@ -0,0 +1 @@ +Please generate images according to the following page descriptions, ensuring that all images maintain consistency with the given scenes. New scenes in the same style or variations of the given scenes are allowed. Page 1: The story begins in the living room of the house (using the second scene), with sunlight streaming through the window and a gentle breeze moving the curtains. Inside, two cute animal friends—a rabbit and a squirrel—sit by the sofa. On the table are a few books, a cup of tea, and a large map showing their exploration route for the day. The rabbit excitedly points at the map and says, “We are going to explore the mysterious cabin deep in the forest!” Page 2: The rabbit and squirrel begin packing for their adventure (using the first scene). The shoe rack holds several pairs of boots, and colorful umbrellas are placed neatly in the stand. The squirrel grabs its little backpack, filled with snacks and exploration tools. Outside, the sun still shines brightly, and flowers sway gently in the breeze, filling the house with anticipation before their journey. Page 3: After entering the forest, it suddenly starts raining heavily. The rabbit and squirrel rush into an old, worn-down cabin deep in the forest to take shelter from the rain (using the third scene). The cabin looks abandoned, with wooden boards falling off the walls and rainwater dripping onto the floor from the windows. On the table, there are some acorns and apples. The animals dry themselves off and sit by the table, enjoying the snacks they brought while listening to the rain outside. Page 4: The rain starts to ease, and sunlight filters through the broken window into the room (using the third scene). The squirrel finds an old book in the corner and begins telling a story to the rabbit. The rabbit sits by the window, holding a warm drink, quietly watching the sky clearing up outside. Sunlight warms them as the cabin fills with a cozy atmosphere. Page 5: The rain finally stops, and the rabbit and squirrel step out of the cabin, breathing in the fresh post-rain air as the forest becomes incredibly peaceful. They stand in the clearing outside the cabin (using a variant of the third scene), joyfully hopping over puddles while discussing their next adventure. Their laughter echoes throughout the forest. Page 6: After their adventure, dusk begins to fall, and the rabbit and squirrel return to their warm little house (using the second scene). The tea on the table has gone cold, but the house still exudes a cozy atmosphere. Exhausted, they collapse onto the sofa, reminiscing about their exciting day, and soon drift off to sleep with smiles on their faces. Page 7: Night falls, and the house is softly lit by warm lamps, with stars twinkling outside the window (using the second scene). A gentle night breeze moves the curtains as the squirrel and rabbit snuggle together, gazing at the starry sky through the window, whispering about tomorrow's adventure. The night is peaceful and beautiful. \ No newline at end of file diff --git a/dataset/childrens_book_generation_scenario_definition_0002/meta.json b/dataset/childrens_book_generation_scenario_definition_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..2d126ef2dacd24f87f649d53387b730e003b85fc --- /dev/null +++ b/dataset/childrens_book_generation_scenario_definition_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "childrens book generation with scenario definition", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0045", + "output_image_count": 7, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/couple_icon_generation_with_reference_0001/eval.json b/dataset/couple_icon_generation_with_reference_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b67f3b8753be143ddeac73fe22d53f0965f0c7a6 --- /dev/null +++ b/dataset/couple_icon_generation_with_reference_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated duo icon retain the basic style and artistic elements of the reference avatar?", + "0_point_standard": "The style of the duo icon is noticeably different from the reference avatar, with significant differences in artistic elements.", + "1_point_standard": "The duo icon accurately retains the style and artistic elements of the reference avatar, ensuring visual consistency." + }, + { + "question": "Does the generated duo icon accurately reflect the expected gender pairing, complementing the reference avatar or following specific gender indications in the text?", + "0_point_standard": "The gender presentation does not match the expected gender pairing, failing to create a complementary appearance or ignoring specified gender indications.", + "1_point_standard": "The gender presentation appropriately complements the reference avatar or aligns with specific gender specifications provided in the text input." + }, + { + "question": "Does the generated duo icon accurately execute any specific instructions from the text input, such as adding specific accessories or expressions?", + "0_point_standard": "The duo icon does not include the specified changes or executes them inaccurately, failing to meet the text input requirements.", + "1_point_standard": "The duo icon correctly and accurately incorporates the specified changes based on the text input." + }, + { + "question": "Are the expressions and character dynamics in the generated duo icon consistent with the reference avatar, creating a cohesive and complementary pairing?", + "0_point_standard": "Expressions, poses, or visual cues do not match the emotion or dynamics of the reference avatar, leading to a disconnect between the two icons.", + "1_point_standard": "Expressions, poses, and visual cues match well with the reference avatar, creating a harmonious and consistent dynamic between the two icons." + }, + { + "question": "Does the generated duo icon exhibit high-quality rendering in terms of detail, clarity, and resolution?", + "0_point_standard": "The rendering quality of the duo icon is poor, with noticeable issues in detail, clarity, or resolution.", + "1_point_standard": "The duo icon is rendered with high detail, clarity, and resolution, reflecting professional quality." + }, + { + "question": "Does the overall aesthetic of the generated duo icon meet professional standards, providing a visually appealing and cohesive image?", + "0_point_standard": "The duo icon lacks aesthetic appeal, with poorly matched elements leading to a discordant or unattractive image.", + "1_point_standard": "The duo icon is aesthetically pleasing, with a cohesive and visually appealing appearance that meets professional standards." + } + ] +} \ No newline at end of file diff --git a/dataset/couple_icon_generation_with_reference_0001/images.txt b/dataset/couple_icon_generation_with_reference_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7fcb9521b1792ab079970b7fe180c8f1601a269 --- /dev/null +++ b/dataset/couple_icon_generation_with_reference_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01xTagy622fq6N2114C_!!6000000007148-0-tps-564-846.jpg diff --git a/dataset/couple_icon_generation_with_reference_0001/instruction.txt b/dataset/couple_icon_generation_with_reference_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..40c4d45fba9a0b2605afa34104e2222f13454eb4 --- /dev/null +++ b/dataset/couple_icon_generation_with_reference_0001/instruction.txt @@ -0,0 +1 @@ +Generate a male avatar that matches the style of the given female avatar. The male should have distinctive features, such as a different hairstyle and facial expression, but should maintain the same color brightness and visual elements like stars. The background remains consistent, emphasizing the harmonious pairing of couple avatars, with other details corresponding to the female avatar. \ No newline at end of file diff --git a/dataset/couple_icon_generation_with_reference_0001/meta.json b/dataset/couple_icon_generation_with_reference_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3f970f62159934f71d7af49d8ae136e1e8537796 --- /dev/null +++ b/dataset/couple_icon_generation_with_reference_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "couple icon generation with single reference", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0071", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/creativity_transfer_0002/eval.json b/dataset/creativity_transfer_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..38641673707c743d5a0ae135314798ed0b37b021 --- /dev/null +++ b/dataset/creativity_transfer_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does each generated image clearly reflect the core creative concept of the reference images (e.g., merging instruments with letters, or combining food with animals)?", + "0_point_standard": "The generated image fails to clearly convey the specified creative concept and deviates from the theme shown in the reference image.", + "1_point_standard": "Each generated image clearly reflects the core creative concept of the reference image, staying true to the specified theme." + }, + { + "question": "Is the creative concept consistently applied across all generated images, with each image showing similar creativity and thematic focus?", + "0_point_standard": "The creative concept is inconsistently applied, with some images not fully capturing the intended theme.", + "1_point_standard": "The creative concept is consistently applied across all images, with each image clearly and focusedly reflecting the theme." + }, + { + "question": "Do the generated images show variation in design while remaining within the scope of the original creative concept, offering diverse interpretations of the theme?", + "0_point_standard": "The generated images lack design diversity or fail to explore different interpretations within the creative theme.", + "1_point_standard": "The generated images demonstrate diverse interpretations of the theme, with each image offering a unique and cohesive understanding of the original concept." + }, + { + "question": "Do the details and level of refinement in each generated image make the concept clear and engaging, enhancing the overall quality of the image set?", + "0_point_standard": "The generated images lack detail or refinement, making the creative concept unclear or less engaging.", + "1_point_standard": "Each generated image has a high level of detail and refinement, clearly conveying the concept and making the entire set visually appealing." + }, + { + "question": "Do the elements within each image (e.g., color, texture, and lighting) harmoniously blend together, contributing to a cohesive and polished appearance?", + "0_point_standard": "The elements within the images do not blend well, resulting in a disjointed or unpolished appearance.", + "1_point_standard": "The elements within each image are well-blended, with harmonious color, texture, and lighting, contributing to a cohesive and polished look." + }, + { + "question": "Does the final image collection exhibit a high level of aesthetic quality, with each image enhancing the creative concept and contributing to an appealing and professional-looking collection?", + "0_point_standard": "The final image collection lacks aesthetic appeal or consistency, detracting from the overall visual impact of the collection.", + "1_point_standard": "The final image collection exhibits high aesthetic quality, with each image enhancing the creative concept and creating an appealing, professional-looking collection." + } + ] +} \ No newline at end of file diff --git a/dataset/creativity_transfer_0002/images.txt b/dataset/creativity_transfer_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..fcabda8a68410618fee91b672c2ececdecff004b --- /dev/null +++ b/dataset/creativity_transfer_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN015AAbEt1owsDCOaf0v_!!6000000005290-0-tps-1280-1710.jpg diff --git a/dataset/creativity_transfer_0002/instruction.txt b/dataset/creativity_transfer_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3248448251f73116ab2a4a2152aae32cec9620ca --- /dev/null +++ b/dataset/creativity_transfer_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a series of images that follow the same creative style as the provided image. The image I provided shows a lightbulb with a tree inside, symbolizing the concepts of environmentalism and sustainability. You are required to generate four additional creative images, each combining different elements within an unusual container, while maintaining the same creative style. The first image should feature an umbrella with an ocean and dolphins inside, with the arc of the umbrella filled with the movement of water; the second image should show a book with a forest growing out from between its pages, with trees and vines sprouting from the book; the third image should depict a coffee cup with a waterfall flowing inside, connecting to a valley and stream; the fourth image should show an hourglass where the sand is transformed into a cosmic nebula, changing as time flows. All images must maintain a consistent creative style, reflecting the same level of imagination and innovative visual design. \ No newline at end of file diff --git a/dataset/creativity_transfer_0002/meta.json b/dataset/creativity_transfer_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..f4aca7cc432bd40243cbbc877cb922bd3dfb716b --- /dev/null +++ b/dataset/creativity_transfer_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "creativity transfer", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0042", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/drawing_process_generation_0001/eval.json b/dataset/drawing_process_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..3ad8cc12ca5876aff251287e26b3e1e987afed53 --- /dev/null +++ b/dataset/drawing_process_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sequence of images logically depict the painting process from sketch to final artwork?", + "0_point_standard": "The sequence lacks clear progression or logical order, failing to illustrate the painting process.", + "1_point_standard": "The sequence clearly and logically depicts the painting process from initial sketch to final artwork." + }, + { + "question": "Does the final artwork conform to the description provided in the text input?", + "0_point_standard": "The final artwork significantly deviates from the description in the text input.", + "1_point_standard": "The final artwork accurately matches the description provided in the text input." + }, + { + "question": "Is the style of the intermediate process images consistent throughout the sequence?", + "0_point_standard": "The image styles vary greatly, leading to a visually incoherent result.", + "1_point_standard": "All images maintain a consistent style, creating a cohesive visual effect throughout the process." + }, + { + "question": "Is the depiction of the main objects or characters consistent throughout the image sequence?", + "0_point_standard": "The main objects or characters differ greatly between images, making it difficult to recognize them as the same entity.", + "1_point_standard": "The main objects or characters are consistent and easily recognizable as the same entity across all images." + }, + { + "question": "Is the sequence logically accurate in representing the expected steps of the painting process (e.g., sketch, inking, coloring)?", + "0_point_standard": "The representation of the painting process is illogical or unrealistic, with noticeable errors in step sequence.", + "1_point_standard": "The sequence accurately and logically represents the expected steps of the painting process." + }, + { + "question": "Does the final artwork's detail and aesthetics meet professional standards and exhibit visual appeal?", + "0_point_standard": "The final artwork lacks detail, has poor aesthetics, and does not meet professional standards.", + "1_point_standard": "The final artwork is richly detailed, aesthetically excellent, meets professional standards, and is visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/drawing_process_generation_0001/images.txt b/dataset/drawing_process_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/drawing_process_generation_0001/instruction.txt b/dataset/drawing_process_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..936c7f1609742f26e38c3b8874037d14f2b1a8e8 --- /dev/null +++ b/dataset/drawing_process_generation_0001/instruction.txt @@ -0,0 +1 @@ +The final artwork depicts a serene lakeside sunset, with the lake reflecting the golden sunset, surrounded by trees bathed in an orange glow, and a small boat on the shore. The main steps include: first, sketching the basic outline of the lake and trees; then adding initial colors to the sky and water, with the sunlight gradually emerging; next, enriching the details of the trees, the sky turning orange, and reflections appearing on the lake; finally, refining the light and shadow effects, adding the boat and ripples on the water, making the scene more vivid and three-dimensional. The model's goal is to generate these intermediate steps in sequence, with each step showing a gradual increase in detail. \ No newline at end of file diff --git a/dataset/drawing_process_generation_0001/meta.json b/dataset/drawing_process_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..42fa8772b818e00f7304a99b24eea9a3696245c4 --- /dev/null +++ b/dataset/drawing_process_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "drawing process generation without reference", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0011", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0003/auto_eval.jsonl b/dataset/dynamic_character_design_expression_design_0003/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..817f77ad0ba8f222778252b4d27929e0146649b7 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0003/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg"], "question": "Is the number in the image the digit 4? 0 points: The number in the image is not the digit 4; 1 point: The number in the image is the digit 4. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and fourth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDo the first and fourth images maintain a consistent character identity, with the same facial features, mask, and combat suit design? 0 points: The character identity appears inconsistent, with noticeable differences in facial features, mask, or suit design that make it hard to recognize the same superhero. 1 point: The character identity is consistent, with matching facial features, mask, and suit design clearly indicating the same character.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and third images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDo the first and third images maintain a consistent minimal line-art style, with similar line thickness, simplicity, and overall design aesthetic? 0 points: The line-art style varies noticeably between the two images, reducing the cohesion of the series. 1 point: The line-art style is consistent, with matching line thickness and simplicity, ensuring visual coherence across the series.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and third images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDo the second and third images display clearly distinct expressions (confidence vs. anger) that effectively convey different emotional states? 0 points: The expressions are too similar or lack distinctness, making it difficult to interpret varied emotional states. 1 point: The expressions are clearly distinct, effectively conveying the intended emotions and adding expressive diversity to the series.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and second images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDo the first and second images display consistent details in the superhero’s mask and costume, without any unnecessary variations? 0 points: The mask or costume details differ noticeably between images, reducing continuity in the character’s visual presentation. 1 point: The mask and costume details are consistent across both images, reinforcing the character’s identity and continuity.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fourth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDo the third and fourth images demonstrate high aesthetic quality in line-art style, with clean, intentional lines that enhance the superhero’s expression and visual impact? 0 points: The line quality is inconsistent or unclear, with weak visual impact that detracts from the character’s expression. 1 point: The line quality is strong, with clean and intentional lines that enhance the expression and create a visually impactful image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/dynamic_character_design_expression_design_0003/eval.json b/dataset/dynamic_character_design_expression_design_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..fe1535606e536906dd28bbda79f5e6e262a59317 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements described in the text?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Do the first and fourth images maintain consistent character identity, with matching facial features, mask, and battle suit design?", + "0_point_standard": "The character identity is inconsistent, with noticeable differences in facial features, mask, or battle suit design, making it difficult to recognize the same superhero.", + "1_point_standard": "The character identity is consistent, with matching facial features, mask, and battle suit design, clearly indicating the same character." + }, + { + "question": "Do the first and third images maintain a consistent minimalist line art style, with similar line thickness, simplicity, and overall design aesthetics?", + "0_point_standard": "The line art style differs significantly between the two images, reducing the series' coherence.", + "1_point_standard": "The line art style is consistent, with matching line thickness and simplicity, ensuring visual coherence in the series." + }, + { + "question": "Do the second and third images display clearly distinct expressions (confidence and anger), effectively conveying different emotional states?", + "0_point_standard": "The expressions are too similar or lack obvious differences, making it difficult to interpret them as different emotional states.", + "1_point_standard": "The expressions are distinctly different, effectively conveying the intended emotions, adding expressive diversity to the series." + }, + { + "question": "Do the first and second images maintain consistency in the superhero's mask and costume details, with no unnecessary changes?", + "0_point_standard": "The mask or costume details differ noticeably between the images, reducing the coherence of the character's visual presentation.", + "1_point_standard": "The mask and costume details are consistent across the two images, reinforcing character identity and visual continuity." + }, + { + "question": "Do the third and fourth images exhibit a high level of aesthetic quality in the line art style, with clean, intentional lines that enhance the superhero's expression and visual impact?", + "0_point_standard": "The line quality is inconsistent or unclear, resulting in weaker visual impact and diminished character expression.", + "1_point_standard": "The line quality is high, with clean and intentional lines, enhancing expressions and creating visually impactful images." + } + ] +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0003/images.txt b/dataset/dynamic_character_design_expression_design_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/dynamic_character_design_expression_design_0003/instruction.txt b/dataset/dynamic_character_design_expression_design_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..ad891a745bd4344bb6cbc145329fc953b3b04cc4 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a superhero character in a minimal line-art style, showing only the head or upper body. He is wearing a black mask and a combat suit. Generate a set of 4 images with different expressions: the first image shows him with a serious expression, conveying a strong sense of responsibility; the second image shows him smiling confidently, displaying power and strength; the third image shows him angry, ready for battle; the fourth image shows him with a tense expression, eyebrows slightly furrowed, eyes alert. Ensure all facial expressions are diverse, while the mask and upper body remain consistent, with the same character ID in every image. \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0003/meta.json b/dataset/dynamic_character_design_expression_design_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b4097245f4e44c07ba7dd5a528c3eea25dfa7931 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "dynamic character expression design", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0016", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0004/auto_eval.jsonl b/dataset/dynamic_character_design_expression_design_0004/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..db656583b9b5070469a81f66404545b5a5414f22 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0004/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg", "0005.jpg"], "question": "Is the number in the image the digit 5? 0 points: The number in the image is not the digit 5; 1 point: The number in the image is the digit 5. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and third images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nIs the sketch style, including line quality, shading, and detailing, consistent between the first image (neutral expression) and the third image (surprised expression)? 0 Points: The style varies significantly between the two images, with noticeable differences in line quality, shading, or detailing that disrupt visual consistency. 1 Point: The style is consistent in both images, with cohesive line quality, shading, and detailing.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nDoes the character ID, including facial structure, helmet design, and general appearance, remain consistent between the second image (angry expression) and the fourth image (focused expression)? 0 Points: The character appears different between the two images, with notable changes in facial structure or helmet design that make him look like a different person. 1 Point: The character ID is consistent across both images, maintaining a recognizable appearance with no significant changes.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0005.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and fifth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nIs the high-tech helmet design, including its shape, features, and details, consistent between the first image (neutral expression) and the fifth image (satisfied smile)? 0 Points: The helmet design varies between the two images, with noticeable differences in shape, features, or detailing. 1 Point: The helmet design is consistent in both images, with no significant changes in shape, features, or details.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0005.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fifth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nAre the specific expressions accurately depicted, particularly comparing the third image (surprised expression) and the fifth image (satisfied smile)? 0 Points: The expressions are not clearly conveyed, making it difficult to distinguish between the surprised and satisfied expressions. 1 Point: The expressions are accurately depicted, with the character’s facial cues clearly matching the described emotions.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0005.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth images of the response provided by a student. The task objective is to generate different facial expression designs for the same character.\nThe text requirement is:\nPlease generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image.\nYour review question is:\nIs there a coherent portrayal of the bounty hunter’s character across the set, specifically between the second image (angry expression) and the fourth image (focused expression), reflecting a consistent sense of personality and tone? 0 Points: The character’s personality or tone appears inconsistent between the two images, lacking a unified portrayal. 1 Point: The character’s personality is coherent across both images, presenting a unified and consistent tone.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/dynamic_character_design_expression_design_0004/eval.json b/dataset/dynamic_character_design_expression_design_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..2ca6d04a4ad864d9bd843636abc147fa4ddfedb9 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements described in the text?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "In the first image (neutral expression) and the third image (surprised expression), is the sketch style consistent, including line quality, shading, and details?", + "0_point_standard": "There is a significant difference in style between the two images, with noticeable differences in line quality, shading, or details, disrupting visual consistency.", + "1_point_standard": "The style of the two images is consistent, with line quality, shading, and details in harmony." + }, + { + "question": "In the second image (angry expression) and the fourth image (focused expression), does the character identity remain consistent, including facial structure, helmet design, and overall appearance?", + "0_point_standard": "The characters in the two images look different, with significant changes in facial structure or helmet design, making them appear as different individuals.", + "1_point_standard": "The character identity remains consistent across the two images, with no significant changes in appearance, maintaining recognizability." + }, + { + "question": "In the first image (neutral expression) and the fifth image (satisfied smile), does the design of the high-tech helmet remain consistent, including its shape, features, and details?", + "0_point_standard": "There are differences in the helmet design between the two images, with noticeable changes in shape, features, or details.", + "1_point_standard": "The helmet design remains consistent across the two images, with no significant changes in shape, features, and details." + }, + { + "question": "Is the specific expression accurately portrayed, especially when comparing the third image (surprised expression) and the fifth image (satisfied smile)?", + "0_point_standard": "The expressions are not clearly portrayed, making it difficult to distinguish between surprised and satisfied expressions.", + "1_point_standard": "The expressions are accurately portrayed, with the character's facial features clearly corresponding to the described emotions." + }, + { + "question": "Throughout the image set, is the image of the bounty hunter character consistent, especially between the second image (angry expression) and the fourth image (focused expression), reflecting a unified character and tone?", + "0_point_standard": "The character's personality or tone is inconsistent between the two images, lacking a unified portrayal.", + "1_point_standard": "The character's personality is coherent and consistent across the two images, presenting a unified and consistent tone." + } + ] +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0004/images.txt b/dataset/dynamic_character_design_expression_design_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/dynamic_character_design_expression_design_0004/instruction.txt b/dataset/dynamic_character_design_expression_design_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..45f60c8e1220544314979c565da44293c50b7051 --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a futuristic bounty hunter character in a sketch style, showing only the head or upper body. He is wearing a high-tech helmet. Generate a set of 5 images with different expressions: the first image shows him with a neutral, expressionless face; the second image shows him with an angry expression, cold eyes; the third image shows him surprised, with wide-open eyes; the fourth image shows him focused, as if analyzing a situation; the fifth image shows him smiling with satisfaction, confident after a successful mission. Ensure all facial expressions are varied, while the helmet and upper body remain consistent, with the same character ID in every image. \ No newline at end of file diff --git a/dataset/dynamic_character_design_expression_design_0004/meta.json b/dataset/dynamic_character_design_expression_design_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..411b0930e014c5ad7bf62ceebb03ce203fd8d9ab --- /dev/null +++ b/dataset/dynamic_character_design_expression_design_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "dynamic character expression design", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0016", + "output_image_count": 5, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_pose_design_0001/auto_eval.jsonl b/dataset/dynamic_character_design_pose_design_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..bccb167e987aa624cb8a730de0fb65cb3385a7a3 --- /dev/null +++ b/dataset/dynamic_character_design_pose_design_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg"], "question": "Is the number in the image the digit 4? 0 points: The number in the image is not the digit 4; 1 point: The number in the image is the digit 4. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and third images of the response provided by a student. The task objective is to generate different pose designs for the same character.\nThe text requirement is:\nPlease generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each.\nYour review question is:\nAre the knight’s armor details, including material and design, consistent between the first image (standing with sword across chest) and the third image (raising sword in an attack-ready position)? 0 Points: The armor details, including material (e.g., shininess of the silver) and design elements, appear different between the two images. 1 Point: The armor details, material, and design elements are consistent between the first and third images, maintaining a cohesive look.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth images of the response provided by a student. The task objective is to generate different pose designs for the same character.\nThe text requirement is:\nPlease generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each.\nYour review question is:\nDoes the character’s facial expression and ID consistency match between the second image (kneeling with one hand on the sword) and the fourth image (swinging the sword to the side)? 0 Points: The character’s facial expression or identifiable features (such as face structure, hair, or general appearance) differ noticeably, suggesting a change in character ID. 1 Point: The character’s facial expression and identifiable features remain consistent between the second and fourth images, confirming it is the same knight.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and fourth images of the response provided by a student. The task objective is to generate different pose designs for the same character.\nThe text requirement is:\nPlease generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each.\nYour review question is:\nIs the weapon, specifically the greatsword’s design, consistent between the first image (standing) and the fourth image (swinging the sword)? 0 Points: The greatsword’s design, including size, shape, and details, differs noticeably between the two images. 1 Point: The greatsword’s design remains consistent between the first and fourth images, with no noticeable changes in size, shape, or other details.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and third images of the response provided by a student. The task objective is to generate different pose designs for the same character.\nThe text requirement is:\nPlease generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each.\nYour review question is:\nIs the knight’s stance and posture executed accurately for the specified poses in the second image (kneeling with sword planted in the ground) and the third image (raising sword in attack-ready position)? 0 Points: The stances or postures do not clearly reflect the described actions (kneeling and attack-ready), making the poses ambiguous. 1 Point: The stances and postures accurately depict the specified actions, showing a clear kneeling and attack-ready pose.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and second images of the response provided by a student. The task objective is to generate different pose designs for the same character.\nThe text requirement is:\nPlease generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each.\nYour review question is:\nAre the lighting and shading effects applied consistently between the first image (standing with sword across chest) and the second image (kneeling with sword planted in the ground), maintaining a realistic appearance? 0 Points: The lighting and shading effects differ noticeably, disrupting the realistic style and consistency between images. 1 Point: The lighting and shading effects are consistent between the two images, preserving the realistic style across both stances.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/dynamic_character_design_pose_design_0001/eval.json b/dataset/dynamic_character_design_pose_design_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5f02dc181a37c4d127a980a62c04f0c3627bc2c2 --- /dev/null +++ b/dataset/dynamic_character_design_pose_design_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements described in the text?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "In the first image (standing with sword at chest) and the third image (raising sword to attack), do the knight's armor details (including material and design) remain consistent?", + "0_point_standard": "The armor details, including material (such as the silver sheen) and design elements, differ between the two images.", + "1_point_standard": "The armor details, material, and design elements are consistent in the first and third images, presenting a cohesive appearance." + }, + { + "question": "In the second image (kneeling with sword in one hand) and the fourth image (sideward sword swing), are the character's facial expressions and identity consistent?", + "0_point_standard": "The character's facial expressions or recognizable features (such as facial structure, hairstyle, or overall appearance) differ significantly, suggesting a change in character identity.", + "1_point_standard": "The character's facial expressions and recognizable features are consistent in the second and fourth images, confirming it is the same knight." + }, + { + "question": "In the first image (standing) and the fourth image (swinging sword), does the weapon design, especially the greatsword, remain consistent?", + "0_point_standard": "The design of the greatsword (including size, shape, and details) differs significantly between the two images.", + "1_point_standard": "The design of the greatsword remains consistent in the first and fourth images, with no noticeable changes in size, shape, or other details." + }, + { + "question": "In the second image (kneeling with sword in one hand) and the third image (raising sword to attack), does the knight's posture accurately reflect the specified stance?", + "0_point_standard": "The posture does not clearly express the described actions (kneeling and preparing to attack), making the stance unclear.", + "1_point_standard": "The posture accurately reflects the specified actions, clearly demonstrating the stances of kneeling and preparing to attack." + }, + { + "question": "In the first image (standing with sword at chest) and the second image (kneeling with sword in one hand), do the lighting and shadow effects remain consistent, maintaining a realistic appearance?", + "0_point_standard": "The lighting and shadow effects differ significantly, disrupting the realistic style and consistency between the images.", + "1_point_standard": "The lighting and shadow effects remain consistent across the two images, maintaining a realistic style across different stances." + } + ] +} \ No newline at end of file diff --git a/dataset/dynamic_character_design_pose_design_0001/images.txt b/dataset/dynamic_character_design_pose_design_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/dynamic_character_design_pose_design_0001/instruction.txt b/dataset/dynamic_character_design_pose_design_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..58fbf991ec25915b1c1ba2b3ad6e39cf7c7e8c55 --- /dev/null +++ b/dataset/dynamic_character_design_pose_design_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a brave knight character in a realistic style. He is wearing shiny silver armor, with a determined face, holding a greatsword. The first image shows him standing with both hands holding the sword across his chest, standing firmly; the second image shows him kneeling with one hand gripping the sword, which is planted into the ground; the third image shows him raising his sword high in an attack-ready position; the fourth image shows him swinging his sword to the side in a swift motion. These images should not include backgrounds, focusing on diverse battle stances, keeping armor and weapon consistent. Ensure the character ID remains consistent across all images, representing the same knight in each. \ No newline at end of file diff --git a/dataset/dynamic_character_design_pose_design_0001/meta.json b/dataset/dynamic_character_design_pose_design_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a2570e220c20ae33dede5a629dc3f72c629730af --- /dev/null +++ b/dataset/dynamic_character_design_pose_design_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "dynamic character pose design", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0015", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0003/eval.json b/dataset/historical_narrative_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..403ee42dfc384754da85af207cd9c8dbcb05ca0e --- /dev/null +++ b/dataset/historical_narrative_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Temporal logic of the event: Does the image sequence logically present the historical events in chronological order?", + "0_point_standard": "The image sequence is not arranged in chronological order or lacks a logical flow, failing to illustrate the progression of historical events.", + "1_point_standard": "The image sequence clearly presents historical events in a logical chronological order." + }, + { + "question": "Consistency with text description: Does the image content accurately match the historical events specified in the text description?", + "0_point_standard": "The image content fails to accurately reflect the historical events in the text description, with noticeable discrepancies or omissions.", + "1_point_standard": "The image content fully matches the text description, accurately depicting the specified historical facts." + }, + { + "question": "Consistency of image style: Is the style and overall visual effect of the image sequence consistent?", + "0_point_standard": "The image style is inconsistent, leading to a disjointed visual effect.", + "1_point_standard": "All images maintain a consistent style, creating a coherent visual effect." + }, + { + "question": "Consistency of object/person ID: Does the generated image sequence maintain consistency in the ID of the same object or person (e.g., the same person or object)?", + "0_point_standard": "The main subject or person is inconsistent across different images, making it difficult to identify them as the same person or object.", + "1_point_standard": "The main subject or person remains consistent and can be clearly identified as the same person or object throughout the sequence." + }, + { + "question": "Logical accuracy of historical reproduction: Based on known historical facts, is the reproduction of historical events reasonable and logically sound?", + "0_point_standard": "The reproduction of historical events is illogical or inconsistent with facts, with significant errors or unrealistic descriptions.", + "1_point_standard": "The reproduction of historical events is reasonable, logical, and accurately reflects known historical facts." + }, + { + "question": "Detail and aesthetic of images: Do the details and aesthetics of the images meet professional standards and possess visual appeal?", + "0_point_standard": "The images lack detail, have poor aesthetics, and do not meet visual standards.", + "1_point_standard": "The images are richly detailed, aesthetically excellent, meet professional standards, and have visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0003/images.txt b/dataset/historical_narrative_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/historical_narrative_generation_0003/instruction.txt b/dataset/historical_narrative_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..c7b24892573b2ee6bdd2b7b1cebf02e685d423b0 --- /dev/null +++ b/dataset/historical_narrative_generation_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a set of images depicting the Viking invasions of Europe from the 8th to 9th centuries. The first image shows Viking warriors landing on the shores of Britain, preparing to raid a monastery; the second image shows Vikings attacking Paris, with their ships floating on the Seine River and the city visible in the distance; the third image shows Vikings establishing settlements in England, with simple houses and farmland around; the fourth image shows a Viking chieftain signing a peace treaty with local lords, with coastal fortifications in the background. All images must maintain a consistent style, showcasing the visual characteristics of the Viking era. \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0003/meta.json b/dataset/historical_narrative_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..ee8b6efef25da565df6b9bf0f9f6d9c33e2be824 --- /dev/null +++ b/dataset/historical_narrative_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "historical narrative generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0012", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0004/eval.json b/dataset/historical_narrative_generation_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..0f854b7c37712e6ef7db09222adec46fe34db337 --- /dev/null +++ b/dataset/historical_narrative_generation_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Temporal logic of events: Does the sequence of images logically present historical events in chronological order?", + "0_point_standard": "The sequence of images is not arranged in chronological order, or lacks logical flow, failing to illustrate the progression of historical events.", + "1_point_standard": "The sequence of images clearly presents historical events in logical chronological order." + }, + { + "question": "Consistency with text description: Does the image content accurately match the historical events specified in the text description?", + "0_point_standard": "The image content fails to accurately reflect the historical events in the text description, with obvious deviations or omissions.", + "1_point_standard": "The image content completely matches the text description, accurately depicting the specified historical facts." + }, + { + "question": "Consistency of image style: Is the style and overall visual effect of the image sequence consistent?", + "0_point_standard": "The image style is inconsistent, leading to a disjointed visual effect.", + "1_point_standard": "All images maintain a consistent style, creating a cohesive visual effect." + }, + { + "question": "Consistency of object/person ID: Does the generated image sequence maintain consistency in the identification of the same object or person (e.g., the same person or object)?", + "0_point_standard": "The main subject or person is inconsistent in different images, making it difficult to recognize them as the same person or object.", + "1_point_standard": "The main subject or person remains consistent and can be clearly recognized as the same across the sequence." + }, + { + "question": "Logical accuracy of historical reproduction: Is the reproduction of historical events reasonable and logically consistent with known historical facts?", + "0_point_standard": "The reproduction of historical events is illogical or inconsistent with facts, with obvious errors or unrealistic descriptions.", + "1_point_standard": "The reproduction of historical events is reasonable, logical, and accurately reflects known historical facts." + }, + { + "question": "Detail and aesthetic of images: Do the details and aesthetics of the images meet professional standards and possess visual appeal?", + "0_point_standard": "The images lack detail, have poor aesthetics, and do not meet visual standards.", + "1_point_standard": "The images are rich in detail, have excellent aesthetics, meet professional standards, and possess visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0004/images.txt b/dataset/historical_narrative_generation_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/historical_narrative_generation_0004/instruction.txt b/dataset/historical_narrative_generation_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..cf5fcbf2a03c50bca371e5df071c3e51fb3b0baa --- /dev/null +++ b/dataset/historical_narrative_generation_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a set of images depicting the expansion of the Mongol Empire from the 13th to 14th centuries. The first image shows Mongol horsemen galloping across the steppe, preparing to attack a neighboring tribe; the second image shows Genghis Khan strategizing with his commanders inside a military tent, with yurts dotting the grassland; the third image shows the Mongol army conquering Baghdad, with their forces marching through the city's gates; the fourth image shows Kublai Khan's court, where foreign envoys are paying tribute, set against the backdrop of a grand palace. All images must maintain a consistent style, reflecting the characteristics of the Mongol Empire. \ No newline at end of file diff --git a/dataset/historical_narrative_generation_0004/meta.json b/dataset/historical_narrative_generation_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..34724a6a25c3b0ad1c8e0430658ed09722754f08 --- /dev/null +++ b/dataset/historical_narrative_generation_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "historical narrative generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0012", + "output_image_count": 4, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_age_transformation_0001/eval.json b/dataset/human_attribute_editing_age_transformation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e412e3dfde5ca27bfbf698e1d10804a018e6d0f7 --- /dev/null +++ b/dataset/human_attribute_editing_age_transformation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the age-transformed image retain the fundamental facial features of the original portrait subject?", + "0_point_standard": "The transformed image shows significant changes in facial features, making it unrecognizable.", + "1_point_standard": "The transformed image retains the subject's facial features, making it easily recognizable." + }, + { + "question": "Does the age transformation modify the subject's age according to the specifications in the task description (e.g., making the person look older or younger)?", + "0_point_standard": "The age transformation does not reflect the age modifications specified in the task description.", + "1_point_standard": "The age transformation accurately reflects the specified age change, making the person look older or younger as instructed." + }, + { + "question": "If the task involves only partial modifications, do features unrelated to age (such as background and clothing) remain unchanged in the transformed image?", + "0_point_standard": "There are noticeable changes in features unrelated to age, such as alterations in background or clothing.", + "1_point_standard": "Features unrelated to age remain unchanged, ensuring the focus is solely on the age transformation." + }, + { + "question": "Does the transformed image maintain a consistent style and quality with the original image, ensuring there is no decline in image quality?", + "0_point_standard": "The style and quality of the transformed image are inconsistent with the original, with a noticeable decline in image quality.", + "1_point_standard": "The style and quality of the transformed image are consistent with the original, maintaining high image quality." + }, + { + "question": "Is the depiction of age transformation realistic, with appropriate attention given to details such as skin texture, hair changes, and facial features?", + "0_point_standard": "The age transformation looks unrealistic, with poor detail handling in skin texture, hair, or facial features.", + "1_point_standard": "The age transformation looks realistic, with well-handled details in skin texture, hair, and facial features." + }, + { + "question": "Does the transformed image possess overall aesthetic appeal, being visually attractive and meeting professional presentation standards, satisfying aesthetic expectations?", + "0_point_standard": "The transformed image lacks aesthetic appeal and does not meet professional presentation standards.", + "1_point_standard": "The transformed image exhibits strong aesthetic appeal, demonstrating professionalism and visual attractiveness." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_age_transformation_0001/images.txt b/dataset/human_attribute_editing_age_transformation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..a3ec5cb23d016a48dc8027ba41cc3eadd4e6c514 --- /dev/null +++ b/dataset/human_attribute_editing_age_transformation_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01PrY4Ci1EVYQ2l6DwK_!!6000000000357-0-tps-2048-1536.jpg diff --git a/dataset/human_attribute_editing_age_transformation_0001/instruction.txt b/dataset/human_attribute_editing_age_transformation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..2ebd70e80f3f13df6f2adff011acefc995cd07b8 --- /dev/null +++ b/dataset/human_attribute_editing_age_transformation_0001/instruction.txt @@ -0,0 +1 @@ +Transform this image of David Beckham into a younger version of himself, keeping his main facial features and hairstyle intact, but with smoother skin and tighter facial contours to reflect a younger age. \ No newline at end of file diff --git a/dataset/human_attribute_editing_age_transformation_0001/meta.json b/dataset/human_attribute_editing_age_transformation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..112b507253c681b97a21e7441802573a65a3e6ce --- /dev/null +++ b/dataset/human_attribute_editing_age_transformation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "human age transformation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0077", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_body_painting_0002/eval.json b/dataset/human_attribute_editing_body_painting_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..22dcab551098e2c495e4935325b24d801e9d5d88 --- /dev/null +++ b/dataset/human_attribute_editing_body_painting_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the body painting pattern in the generated image accurately applied to the same body part as described in the text?", + "0_point_standard": "The body painting is applied incorrectly or deviates significantly from the specified area, such as applying the pattern to an unintended body part.", + "1_point_standard": "The body painting is applied to the specified body part as described, such as the face, shoulders, or torso, without noticeable deviation." + }, + { + "question": "Does the generated image retain the identity and key visual features of the person based on the input reference image?", + "0_point_standard": "The output image does not resemble the person in the reference image, with significant differences in facial features, posture, or body structure.", + "1_point_standard": "The output image closely resembles the person in the reference image, accurately reflecting facial features, posture, and body structure." + }, + { + "question": "Does the generated body painting match the design, style, and level of detail specified in the text description?", + "0_point_standard": "The style, level of detail, or design of the body painting significantly deviates from the specified description, lacking the expected visual effect.", + "1_point_standard": "The body painting closely follows the specified design, style, and level of detail, accurately reflecting the intended visual effect." + }, + { + "question": "In the generated image, do other elements remain unchanged apart from the specified body painting?", + "0_point_standard": "Elements unrelated to the body painting (such as background, clothing, or other visible parts of the body) are altered or modified, affecting image consistency.", + "1_point_standard": "All other elements apart from the body painting (such as background, clothing, or other visible body parts) remain unchanged, maintaining logical consistency." + }, + { + "question": "Does the body painting seamlessly blend with the person's skin, with natural contours and shadows, matching the body morphology?", + "0_point_standard": "The body painting appears artificially applied, lacking natural blending with skin contours, or has issues with shadows and perspective.", + "1_point_standard": "The body painting blends seamlessly with the skin, following natural contours and shadows, presenting a realistic and harmonious appearance." + }, + { + "question": "Does the overall image possess high aesthetic quality, with clear details, smooth lines, and balanced composition?", + "0_point_standard": "The image has noticeable defects, such as rough lines, blurriness, or unbalanced composition, affecting visual appeal.", + "1_point_standard": "The image is aesthetically pleasing, with clear details, smooth lines, and balanced composition, creating a high-quality visual output." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_body_painting_0002/images.txt b/dataset/human_attribute_editing_body_painting_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..aec3fe9fbf5acd11a5c97fa0586787fe859080c4 --- /dev/null +++ b/dataset/human_attribute_editing_body_painting_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i4/O1CN01xoBiUv1Rh5OeQXPBm_!!6000000002142-0-tps-2592-3872.jpg +https://img.alicdn.com/imgextra/i2/O1CN01WKmTP91qrxBlHQaxp_!!6000000005550-0-tps-1065-1241.jpg diff --git a/dataset/human_attribute_editing_body_painting_0002/instruction.txt b/dataset/human_attribute_editing_body_painting_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..13d4894a19ffcdf35e85ab88f2fa412548fc74be --- /dev/null +++ b/dataset/human_attribute_editing_body_painting_0002/instruction.txt @@ -0,0 +1 @@ +Apply the delicate face paint design from the reference image onto the child’s face in this photo. The design should cover the same area around the eye, with the same colors, floral shapes, and sparkling details as in the reference. Ensure that all other elements of the original photo—such as the child’s expression, hairstyle, dress, and natural background—remain unchanged. The final image should integrate the face paint seamlessly, making it look natural on the child’s face while preserving the authenticity and warmth of the original photo. \ No newline at end of file diff --git a/dataset/human_attribute_editing_body_painting_0002/meta.json b/dataset/human_attribute_editing_body_painting_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..69364d2d03f00dbf1fdb151a7f084bd4f173c512 --- /dev/null +++ b/dataset/human_attribute_editing_body_painting_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "body_painting", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0078", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0002/eval.json b/dataset/human_attribute_editing_hairstyle_transformation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..04f1e1556228a5cb5764f533a34ac9f7e7661f3c --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image retain the original facial features and expressions of the character, ensuring no unexpected changes to the face?", + "0_point_standard": "The modified image has altered or distorted facial features or expressions.", + "1_point_standard": "The facial features and expressions are consistent with the original image, with no unexpected changes." + }, + { + "question": "Aside from the hairstyle, does the rest of the image remain unchanged, consistent with the original portrait?", + "0_point_standard": "There are noticeable changes or distortions in parts of the image other than the hairstyle.", + "1_point_standard": "The rest of the image remains unchanged, with only the hairstyle modified." + }, + { + "question": "Does the new hairstyle accurately reflect the specifications and style described in the text input?", + "0_point_standard": "The hairstyle does not match the style or specifications provided in the text description.", + "1_point_standard": "The hairstyle accurately matches the style and specifications outlined in the text input." + }, + { + "question": "Is the modified hairstyle correctly aligned and positioned on the character's head, with no misalignment or awkward placement?", + "0_point_standard": "The modified hairstyle is misaligned or awkwardly placed on the character's head, looking unnatural or out of place.", + "1_point_standard": "The modified hairstyle is accurately aligned and positioned, fitting naturally on the character's head with no noticeable misalignment." + }, + { + "question": "Does the modified hairstyle maintain a natural appearance with appropriate texture and volume matching the original hair quality?", + "0_point_standard": "The modified hairstyle looks artificial, with poor texture or volume inconsistent with the original hair quality.", + "1_point_standard": "The modified hairstyle looks natural, with texture and volume matching the original hair." + }, + { + "question": "Does the overall image have a high aesthetic appeal, and does the modified hairstyle enhance the character's appearance in a visually pleasing way?", + "0_point_standard": "The modified hairstyle detracts from the overall aesthetic appeal of the image.", + "1_point_standard": "The modified hairstyle enhances the character's appearance, making the image more visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0002/images.txt b/dataset/human_attribute_editing_hairstyle_transformation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..1890b2242f94fdb5468e100e43c6c871246b4634 --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01F0FAcL26ehZzpur5B_!!6000000007687-0-tps-3804-5705.jpg diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0002/instruction.txt b/dataset/human_attribute_editing_hairstyle_transformation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..85f05328d02fc720ab6472dbc694a0b6ab10ec36 --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0002/instruction.txt @@ -0,0 +1 @@ +Please modify the hairstyle in this image by changing the man's short hair into a voluminous curly style, adding more texture and giving it a natural, loose appearance with slightly curled ends. Keep all other details, such as clothing and background, unchanged. The goal is to generate a new image that reflects the updated hairstyle, which should appear youthful and energetic. \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0002/meta.json b/dataset/human_attribute_editing_hairstyle_transformation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..91b3e9914ef24d12b54aa7f6b424b562fed4ef4f --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "human hairstyle transformation", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0080", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0003/eval.json b/dataset/human_attribute_editing_hairstyle_transformation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a469e47390081451779226c9ae640ca0d6fb52ee --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image retain the original facial features and expressions of the character, ensuring no unexpected changes to the face?", + "0_point_standard": "The modified image shows changes or distortions in facial features or expressions.", + "1_point_standard": "Facial features and expressions are consistent with the original image, with no unexpected changes." + }, + { + "question": "Aside from the hairstyle, does the rest of the image remain unchanged and consistent with the original portrait?", + "0_point_standard": "There are noticeable changes or distortions in parts of the image other than the hairstyle.", + "1_point_standard": "The rest of the image remains unchanged, with only the hairstyle modified." + }, + { + "question": "Does the new hairstyle accurately reflect the specifications and style described in the text input?", + "0_point_standard": "The hairstyle does not match the style or specifications provided in the text description.", + "1_point_standard": "The hairstyle accurately matches the style and specifications outlined in the text input." + }, + { + "question": "Is the modified hairstyle correctly aligned and positioned on the character's head, without any misalignment or awkward placement?", + "0_point_standard": "The modified hairstyle is misaligned or awkwardly positioned on the character's head, appearing unnatural or out of place.", + "1_point_standard": "The modified hairstyle is accurately aligned and positioned, naturally fitting on the character's head without noticeable misalignment." + }, + { + "question": "Does the modified hairstyle maintain a natural appearance with appropriate texture and volume that matches the quality of the original hair?", + "0_point_standard": "The modified hairstyle looks fake, with poor texture or volume inconsistent with the quality of the original hair.", + "1_point_standard": "The modified hairstyle looks natural, with texture and volume matching the original hair." + }, + { + "question": "Does the overall image have high aesthetic appeal, with the modified hairstyle enhancing the character's appearance in a visually pleasing way?", + "0_point_standard": "The modified hairstyle reduces the overall aesthetic appeal of the image.", + "1_point_standard": "The modified hairstyle enhances the character's appearance, making the image more visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0003/images.txt b/dataset/human_attribute_editing_hairstyle_transformation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..c848fb6b07b2226e35e01767fb2ab41b1232897d --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0003/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01l5YVU41dXGI2oUkZ8_!!6000000003745-0-tps-800-1199.jpg diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0003/instruction.txt b/dataset/human_attribute_editing_hairstyle_transformation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..503bacfaf6e913d9c20c16409b02e6d2897fdfae --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0003/instruction.txt @@ -0,0 +1 @@ +Please modify the hairstyle in this image by changing the girl's long straight hair into playful twin ponytails, with slightly curled ends. The ponytails should fall over her shoulders from both sides, adding a lively vibe. Keep all other details, such as clothing and background, unchanged. The goal is to generate a new image that reflects the updated hairstyle, which should appear cute and playful. \ No newline at end of file diff --git a/dataset/human_attribute_editing_hairstyle_transformation_0003/meta.json b/dataset/human_attribute_editing_hairstyle_transformation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1d91055d9bd2a33192931026c9a97dd5587f86eb --- /dev/null +++ b/dataset/human_attribute_editing_hairstyle_transformation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "human hairstyle transformation", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0080", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_pose_transformation_0002/eval.json b/dataset/human_attribute_editing_pose_transformation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b64c95c5777f3f425e6f5a6e09424c3457d16754 --- /dev/null +++ b/dataset/human_attribute_editing_pose_transformation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the edited portrait accurately reflect the specified pose changes while preserving the subject's natural anatomy and proportions?", + "0_point_standard": "The pose changes appear unnatural or distort the subject's anatomy and proportions, reducing realism.", + "1_point_standard": "The pose changes are executed naturally, preserving the subject's anatomy and proportions, achieving realistic adjustments." + }, + { + "question": "Do the areas of the image unrelated to the pose change remain unchanged, retaining their original appearance and details?", + "0_point_standard": "Areas of the image unrelated to the pose change have been altered or show noticeable changes.", + "1_point_standard": "Areas of the image unrelated to the pose change remain unchanged, retaining their original appearance and details." + }, + { + "question": "Does the edited image maintain the original content and style of the input image, ensuring consistency of identity and environment?", + "0_point_standard": "The edited image shows significant deviation in content and style from the input image, affecting identity recognition or environmental consistency.", + "1_point_standard": "The edited image's content and style are consistent with the input image, preserving identity recognition and environmental background." + }, + { + "question": "Are the limbs and body parts in the edited pose placed in a natural and realistic manner consistent with human anatomy?", + "0_point_standard": "Limbs or body parts in the new pose look awkward, misplaced, or inconsistent with human anatomy, affecting overall realism.", + "1_point_standard": "The placement of limbs and body parts is natural and realistic, consistent with human anatomy, enhancing the credibility of the pose." + }, + { + "question": "Does the edited pose seamlessly blend with the subject's environment, including consistent lighting and shadows?", + "0_point_standard": "The pose does not blend well with the environment, showing noticeable inconsistencies in lighting, shadows, or spatial orientation.", + "1_point_standard": "The edited pose seamlessly blends with the environment, with consistent lighting, shadows, and spatial orientation, enhancing overall realism." + }, + { + "question": "Does the edited image exhibit a high level of professional aesthetics, paying attention to details such as lighting, color balance, and composition?", + "0_point_standard": "The edited image lacks aesthetic appeal, with insufficient attention to details like lighting, color balance, or composition.", + "1_point_standard": "The edited image exhibits professional aesthetics, with meticulous attention to details like lighting, color balance, and composition." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_pose_transformation_0002/images.txt b/dataset/human_attribute_editing_pose_transformation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..2de5fa9669083af5710a2d90e662d4cba6d8c22a --- /dev/null +++ b/dataset/human_attribute_editing_pose_transformation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01OzNG8w1u3oHzRJQly_!!6000000005982-0-tps-563-848.jpg diff --git a/dataset/human_attribute_editing_pose_transformation_0002/instruction.txt b/dataset/human_attribute_editing_pose_transformation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..c51cddff63cf8603a9257650f5de0a2bf2960c73 --- /dev/null +++ b/dataset/human_attribute_editing_pose_transformation_0002/instruction.txt @@ -0,0 +1 @@ +I am giving you an image of a girl. Please keep the background, facial details, clothing, and all other elements unchanged, but change her action to an arms-open pose. The background and all other details must remain the same. \ No newline at end of file diff --git a/dataset/human_attribute_editing_pose_transformation_0002/meta.json b/dataset/human_attribute_editing_pose_transformation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..c798bd5685fad7cdd4e8a41103a966061792f9d5 --- /dev/null +++ b/dataset/human_attribute_editing_pose_transformation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "human pose transformation", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0079", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_sex_transformation_0001/eval.json b/dataset/human_attribute_editing_sex_transformation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..20f583ee53c9e6894e11f868d71cd8eb358b56ac --- /dev/null +++ b/dataset/human_attribute_editing_sex_transformation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image effectively change the gender of the character while retaining the core facial features and identity of the original portrait?", + "0_point_standard": "The gender transformation is ineffective, or the core facial features and identity of the character have significantly changed.", + "1_point_standard": "The gender transformation is effective and retains the character's core facial features and identity." + }, + { + "question": "Does the generated image maintain the composition and style of the original portrait, ensuring that only the intended gender transformation has been applied?", + "0_point_standard": "There is a noticeable change in the overall composition or style of the original portrait beyond the gender transformation.", + "1_point_standard": "The composition and style of the original portrait are preserved, with changes limited to the intended gender transformation." + }, + { + "question": "Does the generated image accurately reflect the specific gender transformation instructions provided in the text description, such as hairstyle or clothing adjustments?", + "0_point_standard": "The gender transformation does not fully comply with the specific instructions given in the text description.", + "1_point_standard": "The gender transformation accurately reflects all the specific instructions provided in the text description." + }, + { + "question": "Is the quality of the gender transformation consistent throughout the image, avoiding any artifacts or unnatural elements in the edited areas?", + "0_point_standard": "There are noticeable artifacts or unnatural elements in the gender-transformed areas of the image.", + "1_point_standard": "The gender transformation is seamless, with the edited areas appearing natural and free of artifacts." + }, + { + "question": "Does the generated image display a high level of detail and realism in the transformed features, such as skin texture, hair, and facial expressions?", + "0_point_standard": "The transformed features lack detail and realism, appearing artificial or poorly executed.", + "1_point_standard": "The transformed features exhibit a high level of detail and realism, enhancing the overall quality of the image." + }, + { + "question": "Does the generated image possess overall aesthetic appeal, being visually coherent and pleasing, meeting professional standards for portrait editing?", + "0_point_standard": "The image lacks aesthetic appeal and does not meet professional standards for portrait editing.", + "1_point_standard": "The image exhibits strong aesthetic appeal, being visually coherent and pleasing, meeting professional standards." + } + ] +} \ No newline at end of file diff --git a/dataset/human_attribute_editing_sex_transformation_0001/images.txt b/dataset/human_attribute_editing_sex_transformation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..bad21522deb13578dedaeb639c56f0b0b4c0c515 --- /dev/null +++ b/dataset/human_attribute_editing_sex_transformation_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01wIezyD1QVLd9jw2gQ_!!6000000001981-0-tps-2352-3543.jpg diff --git a/dataset/human_attribute_editing_sex_transformation_0001/instruction.txt b/dataset/human_attribute_editing_sex_transformation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0dff262b94577f1d9b1568f01c9e5ca76f9e0479 --- /dev/null +++ b/dataset/human_attribute_editing_sex_transformation_0001/instruction.txt @@ -0,0 +1 @@ +Change the female character in the image to a male character while maintaining the overall appearance. Modify the hairstyle to a shorter masculine cut, keep the elegant style of the shirt or suit jacket, but adjust it to suit a masculine vibe. The body should appear taller and more muscular. Retain the delicate and natural facial features. \ No newline at end of file diff --git a/dataset/human_attribute_editing_sex_transformation_0001/meta.json b/dataset/human_attribute_editing_sex_transformation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1010766be1afe03db736dbcbb679d6c2b9afbd00 --- /dev/null +++ b/dataset/human_attribute_editing_sex_transformation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "human sex transformation", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0074", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0002/eval.json b/dataset/id_photo_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a2079944f667ef81a65b1d3bd53c0132b4fedf8c --- /dev/null +++ b/dataset/id_photo_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated ID photo retain the facial features and identity of the person from the informal portrait?", + "0_point_standard": "The facial features and identity of the person in the ID photo have significantly changed or are unrecognizable compared to the informal portrait.", + "1_point_standard": "The ID photo accurately retains the facial features and identity of the person from the informal portrait." + }, + { + "question": "Has the background of the generated ID photo been appropriately adjusted according to standard ID photo requirements while keeping other parts of the photo unchanged?", + "0_point_standard": "The background has not been appropriately adjusted according to standard ID photo requirements, or other parts of the photo have undergone unnecessary changes.", + "1_point_standard": "The background has been appropriately adjusted according to standard ID photo requirements, with no unnecessary changes to other parts of the photo." + }, + { + "question": "Does the generated ID photo follow the specific instructions in the text description, such as clothing adjustments or expression changes?", + "0_point_standard": "The ID photo does not follow the specific instructions listed in the text description.", + "1_point_standard": "The ID photo accurately follows the specific instructions provided in the text description." + }, + { + "question": "Has the lighting in the generated ID photo been adjusted according to the text description to ensure proper exposure and shadow elimination, presenting a professional appearance?", + "0_point_standard": "The lighting does not meet the requirements of the text description, resulting in improper exposure or visible shadows.", + "1_point_standard": "The lighting has been correctly adjusted to meet the requirements of the text description, ensuring proper exposure and shadow elimination." + }, + { + "question": "Does the generated ID photo exhibit high-quality image details, such as clarity and sharpness, suitable for professional use?", + "0_point_standard": "The ID photo lacks clarity and sharpness, resulting in low image quality, not suitable for professional use.", + "1_point_standard": "The ID photo exhibits high-quality image details, with clarity and sharpness suitable for professional use." + }, + { + "question": "Does the generated ID photo have overall aesthetic consistency, balanced composition, and a professional effect?", + "0_point_standard": "The ID photo lacks aesthetic consistency, has poor composition, or appears unprofessional.", + "1_point_standard": "The ID photo exhibits aesthetic consistency, balanced composition, and a professional effect." + } + ] +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0002/images.txt b/dataset/id_photo_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..c670fc59f260422d8a66c687e708dae1d9224f0a --- /dev/null +++ b/dataset/id_photo_generation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01WvJPDR23wcE1seGZe_!!6000000007320-0-tps-564-664.jpg diff --git a/dataset/id_photo_generation_0002/instruction.txt b/dataset/id_photo_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0b684db558dc9538b8543ad972c16b6eb502c62a --- /dev/null +++ b/dataset/id_photo_generation_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a corresponding ID photo based on the provided casual photo, keeping the facial details, hairstyle, and other key identity features unchanged. The background should be changed to a solid color, typically light blue or white, in line with standard ID photo requirements. The facial expression should remain natural without significant adjustments to ensure consistency with the original casual photo. \ No newline at end of file diff --git a/dataset/id_photo_generation_0002/meta.json b/dataset/id_photo_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..7232b4c4cde97723252a3f69ec6781cf12090443 --- /dev/null +++ b/dataset/id_photo_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ID photo generation", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0097", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0003/eval.json b/dataset/id_photo_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4163fd0f66763f116a7b224fbf468dc889ea873a --- /dev/null +++ b/dataset/id_photo_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated ID photo retain the facial features and identity of the person from the informal portrait?", + "0_point_standard": "The facial features and identity of the person in the ID photo have significantly changed or are unrecognizable compared to the informal portrait.", + "1_point_standard": "The ID photo accurately retains the facial features and identity of the person from the informal portrait." + }, + { + "question": "Has the background of the generated ID photo been appropriately adjusted according to standard ID photo requirements, while other parts of the photo remain unchanged?", + "0_point_standard": "The background has not been appropriately adjusted according to standard ID photo requirements, or other parts of the photo have undergone unnecessary changes.", + "1_point_standard": "The background has been appropriately adjusted according to standard ID photo requirements, and other parts of the photo have not undergone unnecessary changes." + }, + { + "question": "Does the generated ID photo follow the specific instructions given in the text description, such as clothing adjustments or expression changes?", + "0_point_standard": "The ID photo does not follow the specific instructions listed in the text description.", + "1_point_standard": "The ID photo accurately follows the specific instructions provided in the text description." + }, + { + "question": "Has the lighting of the generated ID photo been adjusted according to the text description to ensure proper exposure and shadow elimination, presenting a professional appearance?", + "0_point_standard": "The lighting does not meet the requirements of the text description, resulting in improper exposure or visible shadows.", + "1_point_standard": "The lighting has been correctly adjusted to meet the requirements of the text description, ensuring proper exposure and shadow elimination." + }, + { + "question": "Does the generated ID photo exhibit high-quality image details, such as clarity and sharpness, suitable for professional use?", + "0_point_standard": "The ID photo lacks clarity and sharpness, resulting in low image quality, unsuitable for professional use.", + "1_point_standard": "The ID photo exhibits high-quality image details, with clarity and sharpness suitable for professional use." + }, + { + "question": "Does the generated ID photo have overall aesthetic consistency, balanced composition, and a professional effect?", + "0_point_standard": "The ID photo lacks aesthetic consistency, has poor composition, or appears unprofessional.", + "1_point_standard": "The ID photo exhibits aesthetic consistency, balanced composition, and a professional effect." + } + ] +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0003/images.txt b/dataset/id_photo_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..3ae9578bf33658b4de5c0b8ad29cf478376d80b0 --- /dev/null +++ b/dataset/id_photo_generation_0003/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01MQZRFa1ZFAAxgKcJ5_!!6000000003164-0-tps-564-1002.jpg diff --git a/dataset/id_photo_generation_0003/instruction.txt b/dataset/id_photo_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0b684db558dc9538b8543ad972c16b6eb502c62a --- /dev/null +++ b/dataset/id_photo_generation_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a corresponding ID photo based on the provided casual photo, keeping the facial details, hairstyle, and other key identity features unchanged. The background should be changed to a solid color, typically light blue or white, in line with standard ID photo requirements. The facial expression should remain natural without significant adjustments to ensure consistency with the original casual photo. \ No newline at end of file diff --git a/dataset/id_photo_generation_0003/meta.json b/dataset/id_photo_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..268b8553a8987f803ecc05fb28b8c0a6bb33ae00 --- /dev/null +++ b/dataset/id_photo_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ID photo generation", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0097", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0004/eval.json b/dataset/id_photo_generation_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..fb93eb01bebe85f76d1cd12d35f2f2ccf178e4f5 --- /dev/null +++ b/dataset/id_photo_generation_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated ID photo retain the facial features and identity of the person from the informal portrait?", + "0_point_standard": "The facial features and identity of the person in the ID photo have significantly changed or are unrecognizable compared to the informal portrait.", + "1_point_standard": "The ID photo accurately retains the facial features and identity of the person from the informal portrait." + }, + { + "question": "Has the background of the generated ID photo been appropriately adjusted according to standard ID photo requirements while keeping other parts of the photo unchanged?", + "0_point_standard": "The background has not been appropriately adjusted according to standard ID photo requirements, or other parts of the photo have been unnecessarily altered.", + "1_point_standard": "The background has been appropriately adjusted according to standard ID photo requirements, and other parts of the photo have not been unnecessarily altered." + }, + { + "question": "Does the generated ID photo follow the specific instructions mentioned in the text description, such as clothing adjustments or expression changes?", + "0_point_standard": "The ID photo does not follow the specific instructions listed in the text description.", + "1_point_standard": "The ID photo accurately follows the specific instructions provided in the text description." + }, + { + "question": "Has the lighting in the generated ID photo been adjusted according to the text description to ensure proper exposure and shadow elimination, presenting a professional appearance?", + "0_point_standard": "The lighting does not meet the requirements of the text description, resulting in improper exposure or visible shadows.", + "1_point_standard": "The lighting has been correctly adjusted to meet the requirements of the text description, ensuring proper exposure and shadow elimination." + }, + { + "question": "Does the generated ID photo exhibit high-quality image details, such as clarity and sharpness, suitable for professional use?", + "0_point_standard": "The ID photo lacks clarity and sharpness, resulting in low image quality unsuitable for professional use.", + "1_point_standard": "The ID photo exhibits high-quality image details, with clarity and sharpness suitable for professional use." + }, + { + "question": "Does the generated ID photo have overall aesthetic consistency, with balanced composition and a professional effect?", + "0_point_standard": "The ID photo lacks aesthetic consistency, has poor composition, or appears unprofessional.", + "1_point_standard": "The ID photo exhibits aesthetic consistency, with balanced composition and a professional effect." + } + ] +} \ No newline at end of file diff --git a/dataset/id_photo_generation_0004/images.txt b/dataset/id_photo_generation_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..2cef95d452da84b2e7dc2384fd3edfb9bf77bfc0 --- /dev/null +++ b/dataset/id_photo_generation_0004/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i2/O1CN011rmIQY1b72NgUaOAh_!!6000000003417-0-tps-564-846.jpg diff --git a/dataset/id_photo_generation_0004/instruction.txt b/dataset/id_photo_generation_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0b684db558dc9538b8543ad972c16b6eb502c62a --- /dev/null +++ b/dataset/id_photo_generation_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a corresponding ID photo based on the provided casual photo, keeping the facial details, hairstyle, and other key identity features unchanged. The background should be changed to a solid color, typically light blue or white, in line with standard ID photo requirements. The facial expression should remain natural without significant adjustments to ensure consistency with the original casual photo. \ No newline at end of file diff --git a/dataset/id_photo_generation_0004/meta.json b/dataset/id_photo_generation_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..8a4fa910d6fb53e38b972b0cbb2104de45094d23 --- /dev/null +++ b/dataset/id_photo_generation_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ID photo generation", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0097", + "output_image_count": 1, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/image_blending_double_exposure_0001/auto_eval.jsonl b/dataset/image_blending_double_exposure_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..b2e0fd29529a9d69fdf6cb261f43675cda1b0a77 --- /dev/null +++ b/dataset/image_blending_double_exposure_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first input image and output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nDoes the output image retain clear visibility of the person from the first image, particularly the facial contours and distinct features? 0 points: The person’s facial features are unclear or heavily obscured by textures. 1 point: The person’s facial features, especially the face outline, remain clear and distinguishable in the final image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nIs the bubble texture from the second image visible in the output, softly integrated into the background and around the person? 0 points: The bubble texture is either missing or overly prominent, disrupting the overall balance. 1 point: The bubble texture is visible, subtly integrated around the person, complementing the image without overpowering it.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nDoes the output image retain clarity in the facial features despite the presence of the bubble texture overlay? 0 points: The facial features, especially eyes, nose, and mouth, are obscured or difficult to discern. 1 point: The facial features are clear and discernible, with the bubble texture softly overlaying but not obstructing the key facial details.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nIs the bubble texture primarily distributed around the hair and background areas without significantly covering the central facial area? 0 points: The bubble texture intrudes heavily into the central facial area, disrupting the main subject. 1 point: The bubble texture is primarily positioned around the head and background, enhancing the visual flow without interfering with the central facial area.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nIs there a smooth gradation in the intensity of the bubble overlay, transitioning naturally from background areas to the person’s features? 0 points: The overlay is uniform or abrupt, without a gradual transition, leading to an unnatural appearance. 1 point: The overlay intensity is smoothly graduated, with a subtle increase in texture around the outer contours and background, creating a natural blend.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the output image of the response provided by a student. The task objective is to use dual exposure effect to merge two images into one image. \nThe text requirement is:\nPlease generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style.\nYour review question is:\nDoes the image avoid any harsh edges between the portrait and bubble texture, creating a unified and cohesive look? 0 points: There are visible, harsh edges or outlines separating the portrait from the bubble texture, disrupting the double exposure effect. 1 point: There are no harsh edges; the portrait and texture layers blend cohesively, giving a natural and integrated look.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/image_blending_double_exposure_0001/eval.json b/dataset/image_blending_double_exposure_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5a470aae87451d9918fd0abe00a5137e963112cc --- /dev/null +++ b/dataset/image_blending_double_exposure_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image clearly retain the person from the first image, especially the facial contours and unique features?", + "0_point_standard": "The facial features of the person are unclear or severely obscured by textures.", + "1_point_standard": "The facial features of the person, especially the facial contours, are clear and easily recognizable in the final image." + }, + { + "question": "Are the bubble textures from the second image visible in the output image, softly blending into the background and surrounding the person?", + "0_point_standard": "The bubble textures are missing or too prominent, disrupting the overall balance.", + "1_point_standard": "The bubble textures are visible, softly surrounding the person, complementing the image without being overly distracting." + }, + { + "question": "Despite the bubble texture overlay, do the facial features in the output image remain clear?", + "0_point_standard": "Facial features, especially the eyes, nose, and mouth, are obscured or difficult to discern.", + "1_point_standard": "Facial features are clearly discernible, with bubble textures softly overlaying but not obstructing the main facial details." + }, + { + "question": "Are the bubble textures mainly distributed in the hair and background areas, without noticeably covering the central facial area?", + "0_point_standard": "Bubble textures heavily intrude into the central facial area, affecting the subject.", + "1_point_standard": "Bubble textures are mainly located in the hair and background areas, enhancing visual flow without interfering with the central facial area." + }, + { + "question": "Is the intensity of the bubble overlay smoothly graduated, naturally transitioning from the background areas to the person's features?", + "0_point_standard": "The overlay appears uniform or abrupt, lacking gradient transitions, and appears unnatural.", + "1_point_standard": "The overlay intensity is smoothly graduated, with textures subtly enhancing the outer contours and background areas, naturally blending in." + }, + { + "question": "Does the image avoid harsh edges between the person and bubble textures, presenting a unified and harmonious effect?", + "0_point_standard": "There are noticeable harsh edges or outlines between the person and bubble textures, disrupting the double exposure effect.", + "1_point_standard": "There are no harsh edges; the person and texture layers naturally blend, presenting a unified and integrated appearance." + } + ] +} \ No newline at end of file diff --git a/dataset/image_blending_double_exposure_0001/images.txt b/dataset/image_blending_double_exposure_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..ba0ab91ef18a28ee5c55ad69420148d295f00f23 --- /dev/null +++ b/dataset/image_blending_double_exposure_0001/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01GVNIJN1MWwXyfZPm2_!!6000000001443-0-tps-612-370.jpg +https://img.alicdn.com/imgextra/i2/O1CN01jLhcva1UeHp55DXg5_!!6000000002542-0-tps-790-904.jpg diff --git a/dataset/image_blending_double_exposure_0001/instruction.txt b/dataset/image_blending_double_exposure_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..a2be56b176830fa7e2bc73466499b353f71855ed --- /dev/null +++ b/dataset/image_blending_double_exposure_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a blended image using the given two images, with the blending effect aiming to achieve a visual style similar to double exposure or gradient overlay. Specifically, merge the person from the first image with the bubble and liquid textures from the second image, ensuring a seamless integration. The facial features of the person should remain clear, while the bubble and liquid textures should be softly incorporated into the contours of the person, especially around the hair and background, creating a flowing and ethereal visual effect. The final result should give the impression of coexistence between the person and the abstract textures, with an artistic, dynamic, and layered feel, maintaining a delicate and soft overall style. \ No newline at end of file diff --git a/dataset/image_blending_double_exposure_0001/meta.json b/dataset/image_blending_double_exposure_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a7d523dc14bfe66946c95cdf04f9aa088e5e5e27 --- /dev/null +++ b/dataset/image_blending_double_exposure_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "double exposure", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0064", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/image_blur_filed_blur_0002/eval.json b/dataset/image_blur_filed_blur_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c7523d1014b4ce1f33877680bf0678901c2255e6 --- /dev/null +++ b/dataset/image_blur_filed_blur_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the blurred background in the generated image accurately retain the unblurred parts of the original image?", + "0_point_standard": "The unblurred parts of the image show significant changes or distortion compared to the original image.", + "1_point_standard": "The unblurred parts of the image are accurately retained without any changes or distortion." + }, + { + "question": "Does the generated image retain the key elements and features of the original image, ensuring consistency between the input and output images?", + "0_point_standard": "The key elements or features of the image have been changed or are inconsistent with the original image.", + "1_point_standard": "The key elements and features of the image are consistent with the original image, retaining its essential characteristics." + }, + { + "question": "Does the blurred background effectively meet the requirement of separating or emphasizing the main subject as specified in the text description?", + "0_point_standard": "The blurred background fails to effectively separate or emphasize the main subject as described in the text.", + "1_point_standard": "The blurred background successfully separates or emphasizes the main subject as described in the text." + }, + { + "question": "Does the generated image accurately follow the specific instructions in the text description regarding the degree or style of blur (e.g., soft, intense)?", + "0_point_standard": "The degree or style of blur in the generated image does not match the specific instructions given in the text.", + "1_point_standard": "The degree or style of blur in the generated image accurately follows the specific instructions in the text." + }, + { + "question": "Does the quality of the blur effect enhance the overall image, with smooth transitions and no noticeable flaws?", + "0_point_standard": "The blur effect has noticeable flaws or rough transitions, reducing the overall image quality.", + "1_point_standard": "The blur effect is applied smoothly without noticeable flaws, enhancing the overall quality of the image." + }, + { + "question": "Does the generated image have high aesthetic appeal, with effective composition and visual balance between blurred and unblurred areas?", + "0_point_standard": "The image lacks aesthetic appeal, with poor composition or visual balance between blurred and unblurred areas.", + "1_point_standard": "The image displays strong aesthetic appeal, with good composition and visual balance between blurred and unblurred areas." + } + ] +} \ No newline at end of file diff --git a/dataset/image_blur_filed_blur_0002/images.txt b/dataset/image_blur_filed_blur_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..9b1fbbf8509d2c8b79d55a27534d213823ebf73e --- /dev/null +++ b/dataset/image_blur_filed_blur_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01eA7cpZ1Jvj8ClY6t8_!!6000000001091-0-tps-1125-1688.jpg diff --git a/dataset/image_blur_filed_blur_0002/instruction.txt b/dataset/image_blur_filed_blur_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..2dc4ff4f039dab23214d8a080040afb806377de3 --- /dev/null +++ b/dataset/image_blur_filed_blur_0002/instruction.txt @@ -0,0 +1 @@ +Apply a background blur to this image in the field, keeping the person and foreground in sharp focus, softening the flowers and background to highlight the bouquet in the subject's hands. \ No newline at end of file diff --git a/dataset/image_blur_filed_blur_0002/meta.json b/dataset/image_blur_filed_blur_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3aa6f8fff57ca968b45950623299e60343a7fe49 --- /dev/null +++ b/dataset/image_blur_filed_blur_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "field blur", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0068", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/image_blur_rotation_blur_0001/eval.json b/dataset/image_blur_rotation_blur_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b5496a55597d88f204f31860c6193a0270be7f05 --- /dev/null +++ b/dataset/image_blur_rotation_blur_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the modified portion of the image display the rotational blur effect specified in the task description?", + "0_point_standard": "The rotational part of the image lacks the blur effect, or the blur effect is applied incorrectly.", + "1_point_standard": "The rotational part of the image clearly displays the expected rotational blur effect." + }, + { + "question": "Does the rest of the image remain unchanged, maintaining its original quality and details?", + "0_point_standard": "There are noticeable changes or degradation in the unmodified parts of the image.", + "1_point_standard": "The unmodified parts of the image remain unchanged, retaining their original quality and details." + }, + { + "question": "Does the rotational blur effect correlate well with the original image content, maintaining consistency in style and recognizability?", + "0_point_standard": "The rotational blur effect disrupts the overall style or recognizability of the image, causing inconsistency.", + "1_point_standard": "The rotational blur effect is consistent with the original image’s style and recognizability." + }, + { + "question": "Does the rotational blur effect follow any specific instructions from the text description, such as blur intensity or direction?", + "0_point_standard": "The blur application does not follow the specific instructions provided in the text description.", + "1_point_standard": "The blur application accurately follows the specific instructions provided in the text description." + }, + { + "question": "Is the transition between the blurred and non-blurred parts smooth, without creating unnatural boundaries or artifacts?", + "0_point_standard": "The transition between the blurred and non-blurred parts is abrupt or contains noticeable artifacts.", + "1_point_standard": "The transition between the blurred and non-blurred parts is smooth and natural." + }, + { + "question": "Does the image maintain overall aesthetic appeal after applying the rotational blur, ensuring it meets professional visual standards?", + "0_point_standard": "The image lacks aesthetic appeal or does not meet professional visual standards post-modification.", + "1_point_standard": "The image maintains good aesthetic appeal and meets professional visual standards post-modification." + } + ] +} \ No newline at end of file diff --git a/dataset/image_blur_rotation_blur_0001/images.txt b/dataset/image_blur_rotation_blur_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..9b84c448ef4881da85a11496f0660f397920ea6d --- /dev/null +++ b/dataset/image_blur_rotation_blur_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN0171x6tJ1h7ONkQC0oT_!!6000000004230-0-tps-3630-3630.jpg diff --git a/dataset/image_blur_rotation_blur_0001/instruction.txt b/dataset/image_blur_rotation_blur_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..33af67292ee73a166c953f48123ca445b911c1b1 --- /dev/null +++ b/dataset/image_blur_rotation_blur_0001/instruction.txt @@ -0,0 +1 @@ +Apply rotational blur to the fan blades in this image, blurring the spinning part to give the effect of high-speed rotation, while keeping the rest of the fan sharp and clear. \ No newline at end of file diff --git a/dataset/image_blur_rotation_blur_0001/meta.json b/dataset/image_blur_rotation_blur_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..de1700a9c4c3308087a044492f341ea937af99d1 --- /dev/null +++ b/dataset/image_blur_rotation_blur_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "rotation blur", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0069", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/image_completion_0001/eval.json b/dataset/image_completion_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e5f6cc68d35c0ea35925c01869b66f2ea843d26c --- /dev/null +++ b/dataset/image_completion_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the complete image accurately preserve the unchanged areas of the original partial image?", + "0_point_standard": "The unchanged areas of the original partial image show noticeable changes or distortion in the complete image.", + "1_point_standard": "The unchanged areas of the original partial image are accurately preserved in the complete image, without any noticeable changes." + }, + { + "question": "Does the complete image maintain consistent style and characteristics with the original partial image?", + "0_point_standard": "The style or characteristics of the complete image are noticeably different from the original partial image.", + "1_point_standard": "The complete image maintains consistent style and characteristics with the original partial image." + }, + { + "question": "Does the complete image accurately reflect the specific content requirements described in the text input?", + "0_point_standard": "The complete image fails to include specific content elements described in the text input.", + "1_point_standard": "The complete image successfully includes the specific content elements from the text input." + }, + { + "question": "Does the complete image follow any style or theme guidelines specified in the text description?", + "0_point_standard": "The complete image does not adhere to the style or theme guidelines provided in the text description.", + "1_point_standard": "The complete image follows the style or theme guidelines specified in the text description." + }, + { + "question": "Does the completed area of the image exhibit a high level of detail and quality?", + "0_point_standard": "The completed area of the image lacks detail or is of lower quality compared to the original image.", + "1_point_standard": "The completed area of the image is rich in detail and high in quality, comparable to or exceeding the original image quality." + }, + { + "question": "Does the complete image exhibit overall aesthetic appeal and coherence, providing a visually pleasing and complete result?", + "0_point_standard": "The complete image lacks aesthetic appeal or appears incoherent, with the extended areas looking disjointed.", + "1_point_standard": "The complete image is aesthetically pleasing and coherent, with the extended areas seamlessly integrated." + } + ] +} \ No newline at end of file diff --git a/dataset/image_completion_0001/images.txt b/dataset/image_completion_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..87facf7a2d4d50c2d13bf7509cad4c6ad15d0572 --- /dev/null +++ b/dataset/image_completion_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN017m9zwO28du7goqnTK_!!6000000007956-0-tps-563-846.jpg diff --git a/dataset/image_completion_0001/instruction.txt b/dataset/image_completion_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..664642a0b69f2f5762039bc2a9e53113fcce4cc1 --- /dev/null +++ b/dataset/image_completion_0001/instruction.txt @@ -0,0 +1 @@ +Please extend the image in all directions to reveal the full body of the person, keeping the current face, hairstyle, and all details unchanged. The input image is a cropped region of the final complete image, and the input section must remain unchanged while seamlessly blending into the extended area. The background should match the style of the person, and the extended region should maintain consistency with the overall visual style. \ No newline at end of file diff --git a/dataset/image_completion_0001/meta.json b/dataset/image_completion_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..4f56541ceaf850a8249524731f880aa57770e353 --- /dev/null +++ b/dataset/image_completion_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "image completion", + "num_of_cases": 4, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0086", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/image_retouching_landscape_photo_retouching_0002/eval.json b/dataset/image_retouching_landscape_photo_retouching_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..ec68449bb411f54678192a187ab5286f703cdf98 --- /dev/null +++ b/dataset/image_retouching_landscape_photo_retouching_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the retouched landscape photo retain the original composition and elements in unspecified areas, ensuring only the specified parts are changed?", + "0_point_standard": "The unspecified areas in the image show changes or inconsistencies compared to the original image.", + "1_point_standard": "The unspecified areas remain unchanged, retaining the original composition and elements." + }, + { + "question": "Does the retouched landscape photo retain the overall style and characteristics of the original photo, ensuring seamless integration of modifications?", + "0_point_standard": "The style or characteristics of the original photo are noticeably altered, resulting in a disjointed appearance.", + "1_point_standard": "The original photo's style and characteristics are retained, with modifications seamlessly integrated." + }, + { + "question": "Do the modifications in the retouched landscape photo accurately reflect the specific instructions provided in the text description?", + "0_point_standard": "The modifications do not conform to the instructions or fail to meet the specified requirements.", + "1_point_standard": "The modifications accurately reflect the instructions and meet the specified requirements." + }, + { + "question": "Are all the elements specified in the text description, such as changes in color, lighting, or specific details, accurately reflected in the retouched landscape photo?", + "0_point_standard": "Some specified elements are missing or inaccurately represented in the retouched photo.", + "1_point_standard": "All specified elements are accurately reflected, with no omissions or significant deviations." + }, + { + "question": "Does the retouched landscape photo exhibit high-quality editing with smooth transitions, precise adjustments, and no visible flaws?", + "0_point_standard": "The editing quality is poor, with visible flaws, harsh transitions, or imprecise adjustments.", + "1_point_standard": "The editing quality is high, with smooth transitions, precise adjustments, and no visible flaws." + }, + { + "question": "Does the retouched landscape photo possess enhanced aesthetics, providing a visually pleasing image of professional quality?", + "0_point_standard": "The retouched photo lacks aesthetics and does not meet professional quality standards.", + "1_point_standard": "The retouched photo possesses strong aesthetics, meets professional quality standards, and provides a visually pleasing image." + } + ] +} \ No newline at end of file diff --git a/dataset/image_retouching_landscape_photo_retouching_0002/images.txt b/dataset/image_retouching_landscape_photo_retouching_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..f2d4ce8ab5f1a93d63b969cca1f2eb3d24b0ca0b --- /dev/null +++ b/dataset/image_retouching_landscape_photo_retouching_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01bYdnhL1LNUlEHUqqQ_!!6000000001287-0-tps-5976-3992.jpg diff --git a/dataset/image_retouching_landscape_photo_retouching_0002/instruction.txt b/dataset/image_retouching_landscape_photo_retouching_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..8e13b3ce7e624abaf46f40a9775a3a0f6c8eef44 --- /dev/null +++ b/dataset/image_retouching_landscape_photo_retouching_0002/instruction.txt @@ -0,0 +1 @@ +Adjust the color tones of this photo of the camel rider in the desert to enhance the contrast between the desert and the sky, emphasizing the brightness under the sun. \ No newline at end of file diff --git a/dataset/image_retouching_landscape_photo_retouching_0002/meta.json b/dataset/image_retouching_landscape_photo_retouching_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..423b8c2d5a56a0feaa203489437361151d8805ae --- /dev/null +++ b/dataset/image_retouching_landscape_photo_retouching_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "landscape photo retouching", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0053", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/image_retouching_portrait_photo_retouching_0001/eval.json b/dataset/image_retouching_portrait_photo_retouching_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..8c52883d1fb0a4f49009efa9d2c805e87728677d --- /dev/null +++ b/dataset/image_retouching_portrait_photo_retouching_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the retouched portrait retain the original identity and main facial features of the person?", + "0_point_standard": "The identity or main facial features of the person have been altered to the extent that they are difficult to recognize.", + "1_point_standard": "The identity and main facial features of the person are retained, ensuring clear recognition." + }, + { + "question": "Apart from the specified retouched areas, does the rest of the image remain unchanged?", + "0_point_standard": "There are noticeable changes or modifications in parts of the image that were not specified for retouching.", + "1_point_standard": "Only the specified areas have been modified, with no unintended changes in the rest of the image." + }, + { + "question": "Does the retouched image accurately meet the specific modification requirements described in the text (e.g., smoother skin, adjusted lighting)?", + "0_point_standard": "The specified modifications in the text description were not completed or executed inaccurately.", + "1_point_standard": "The retouched image accurately reflects the modification requirements detailed in the text description." + }, + { + "question": "Is the retouching style consistent with the guidelines provided in the text description (e.g., natural enhancement vs. dramatic enhancement)?", + "0_point_standard": "The retouching style is inconsistent with the provided guidelines, leading to inconsistent or unexpected results.", + "1_point_standard": "The retouching style is consistent with the provided guidelines, meeting the expected results." + }, + { + "question": "Does the retouched portrait maintain consistent and natural skin tone and texture throughout the modified areas?", + "0_point_standard": "There are noticeable inconsistencies in skin tone or texture in the retouched areas, resulting in an unnatural or uneven appearance.", + "1_point_standard": "The skin tone and texture in the retouched areas are consistent and natural, seamlessly blending with the surrounding areas." + }, + { + "question": "Does the overall retouched image exhibit high aesthetic quality and enhance the visual appeal of the portrait through balanced modifications?", + "0_point_standard": "The retouched image lacks aesthetic appeal, and the modifications have diminished the visual quality of the image.", + "1_point_standard": "The retouched image exhibits high aesthetic quality, with modifications enhancing its visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/image_retouching_portrait_photo_retouching_0001/images.txt b/dataset/image_retouching_portrait_photo_retouching_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..a6323991290b7654b877b3a8b9ce9fb290c3e1e0 --- /dev/null +++ b/dataset/image_retouching_portrait_photo_retouching_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01WSwmev1hQd14dMuel_!!6000000004272-0-tps-3744-5616.jpg diff --git a/dataset/image_retouching_portrait_photo_retouching_0001/instruction.txt b/dataset/image_retouching_portrait_photo_retouching_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..c1746107d122b656b1843c5db39637b5b8597750 --- /dev/null +++ b/dataset/image_retouching_portrait_photo_retouching_0001/instruction.txt @@ -0,0 +1 @@ +Edit this portrait to enhance skin smoothness and add a glossy effect, making the overall appearance more fashionable. \ No newline at end of file diff --git a/dataset/image_retouching_portrait_photo_retouching_0001/meta.json b/dataset/image_retouching_portrait_photo_retouching_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..82ec01eb20d57c7d17a294ee37da1166c7399f6c --- /dev/null +++ b/dataset/image_retouching_portrait_photo_retouching_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "portrait photo retouching", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0052", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/image_transfer_digital_makeup_0002/eval.json b/dataset/image_transfer_digital_makeup_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..9398bc5a43ccf14dfd07aa46f817ed31f3fcd2c0 --- /dev/null +++ b/dataset/image_transfer_digital_makeup_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the model ensure that the digital makeup applied to the character in image A is consistent with the makeup style of the character in image B?", + "0_point_standard": "The makeup style applied to image A significantly deviates from the makeup style in image B, with noticeable differences in color, intensity, or patterns.", + "1_point_standard": "The makeup style applied to image A is highly consistent with the makeup style in image B, with precise matching in color, intensity, and patterns." + }, + { + "question": "When applying makeup from image B, does the model maintain the original features and identity of the character in image A?", + "0_point_standard": "The makeup application alters key facial features or the identity of the character in image A, making them unrecognizable or significantly different.", + "1_point_standard": "The model retains the original facial features and identity of the character in image A, ensuring they remain recognizable after the makeup application." + }, + { + "question": "Does the model accurately interpret and apply specific makeup elements from image B (such as lipstick, eyeshadow, or blush) to image A?", + "0_point_standard": "There are noticeable errors in interpreting and applying specific makeup elements, such as mismatched color or placement.", + "1_point_standard": "The model accurately interprets and applies all specified makeup elements, with matching color and placement from image B to image A." + }, + { + "question": "Does the digital makeup transformation affect only the intended makeup areas of image A, while other parts of the image remain unchanged?", + "0_point_standard": "The model inadvertently alters other areas of image A that are not intended for makeup application, affecting the overall consistency of the image.", + "1_point_standard": "The digital makeup transformation is limited to the intended areas, with the rest of image A remaining unchanged and consistent." + }, + { + "question": "Does the digital makeup application enhance the aesthetic quality of image A, adhering to professional makeup standards?", + "0_point_standard": "The makeup application lacks refinement or appears unprofessional, with issues such as uneven application or unnatural appearance.", + "1_point_standard": "The makeup application is aesthetically pleasing, applied smoothly and evenly, adhering to professional makeup standards." + }, + { + "question": "Is the digital makeup applied to image A detailed, showing high-quality rendering in aspects like edges, blending, and texture?", + "0_point_standard": "The makeup shows rough detailing with noticeable defects or blurring in edges, blending, or texture.", + "1_point_standard": "The makeup is detailed, with clear edges, seamless blending, and realistic texture, showing high-quality rendering." + } + ] +} \ No newline at end of file diff --git a/dataset/image_transfer_digital_makeup_0002/images.txt b/dataset/image_transfer_digital_makeup_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..a9269e2e1ca7eeeb7314bcb0c0abb9768e228a88 --- /dev/null +++ b/dataset/image_transfer_digital_makeup_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01IBhmMh1XXp54wSNT3_!!6000000002934-0-tps-2000-2999.jpg +https://img.alicdn.com/imgextra/i1/O1CN01MK3OVB1N5IFgOi84a_!!6000000001518-0-tps-4096-2560.jpg diff --git a/dataset/image_transfer_digital_makeup_0002/instruction.txt b/dataset/image_transfer_digital_makeup_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..7fc2cfbd150d9c9a96ec1a01e8fb21796fc49389 --- /dev/null +++ b/dataset/image_transfer_digital_makeup_0002/instruction.txt @@ -0,0 +1 @@ +The goal is to transfer the makeup from the woman in the second image onto the face of the woman in the first image, while keeping all other aspects of the first image, including facial expressions, pose, and background, unchanged. The makeup should include the full makeup effect, covering eye makeup, lipstick, eyebrows, and foundation. Generate an output image showing the final result after the makeup transfer \ No newline at end of file diff --git a/dataset/image_transfer_digital_makeup_0002/meta.json b/dataset/image_transfer_digital_makeup_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..443a5ba0f85faf2c2d1a4df23348acaa7e6bb790 --- /dev/null +++ b/dataset/image_transfer_digital_makeup_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "digital makeup", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0092", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/image_transfer_id_transfer_0001/eval.json b/dataset/image_transfer_id_transfer_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c80f9dfeba3a6dcadb5e695f0bc64ddb8dff9eb6 --- /dev/null +++ b/dataset/image_transfer_id_transfer_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the identity replaced in the output image match the person specified in image B?", + "0_point_standard": "The identity replaced in the output image does not resemble the person specified in image B, with noticeable differences in facial features or overall appearance.", + "1_point_standard": "The identity replaced in the output image closely resembles the person specified in image B, accurately matching facial features and overall appearance." + }, + { + "question": "Apart from the specified identity transfer, does the rest of image A remain unchanged?", + "0_point_standard": "There are noticeable changes or distortions in parts of image A that were not intended to be modified, affecting the overall integrity of the image.", + "1_point_standard": "Apart from the specified identity transfer, the rest of image A remains unchanged, with the environment and other elements intact." + }, + { + "question": "Does the output image maintain logical consistency with image A in terms of lighting, shadows, and perspective?", + "0_point_standard": "The replaced identity is inconsistent with the rest of image A in terms of lighting, shadows, or perspective, resulting in an unnatural appearance.", + "1_point_standard": "The replaced identity seamlessly blends with the lighting, shadows, and perspective of image A, maintaining a naturally coherent appearance." + }, + { + "question": "Does the modification comply with the specific requirements of the text description (e.g., expression, pose)?", + "0_point_standard": "The model fails to incorporate specific details from the text description, such as facial expression or pose, into the replaced identity.", + "1_point_standard": "The model accurately incorporates specific details from the text description into the replaced identity, meeting the task requirements." + }, + { + "question": "Does the skin tone and texture of the replaced identity blend naturally with the surrounding area, ensuring seamless integration?", + "0_point_standard": "The skin tone or texture of the replaced identity is noticeably different or poorly blended, causing a clear disjunction with the surrounding area.", + "1_point_standard": "The skin tone and texture of the replaced identity blend naturally with the surrounding area, achieving seamless and realistic integration." + }, + { + "question": "Is the output image clearly derived from the input image, maintaining visual and stylistic connections?", + "0_point_standard": "The output image appears unrelated, lacking clear visual or stylistic connections to the input image.", + "1_point_standard": "The output image maintains clear visual and stylistic connections with the input image, reflecting the intended identity transfer while preserving the overall style." + } + ] +} \ No newline at end of file diff --git a/dataset/image_transfer_id_transfer_0001/images.txt b/dataset/image_transfer_id_transfer_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e22334f83705b97ac2f54560499b26916a977a0f --- /dev/null +++ b/dataset/image_transfer_id_transfer_0001/images.txt @@ -0,0 +1,3 @@ +https://img.alicdn.com/imgextra/i1/O1CN01XOQqrb1clAhfhtn8e_!!6000000003640-0-tps-6283-4189.jpg +https://img.alicdn.com/imgextra/i3/O1CN01swcASO1FmKXhBl4MY_!!6000000000529-0-tps-1650-2447.jpg +https://img.alicdn.com/imgextra/i2/O1CN01UjHoLl268drDFlUKt_!!6000000007617-0-tps-2048-2048.jpg diff --git a/dataset/image_transfer_id_transfer_0001/instruction.txt b/dataset/image_transfer_id_transfer_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..297a78e072c7da74f2174814f9ce2ff22e69e54c --- /dev/null +++ b/dataset/image_transfer_id_transfer_0001/instruction.txt @@ -0,0 +1 @@ +Generate an image where the two people playing chess in the first image are replaced by Iron Man from the second image and Captain America from the third image. Keep the primary elements of the original image, such as the chessboard, background, and furniture, unchanged. Ensure that Iron Man and Captain America's poses match the context of playing chess, adjusting their positions slightly if necessary. Their facial expressions should reflect concentration on the chess game. \ No newline at end of file diff --git a/dataset/image_transfer_id_transfer_0001/meta.json b/dataset/image_transfer_id_transfer_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..10e461dc5ccf2570421a33d2858f14209e01c2b9 --- /dev/null +++ b/dataset/image_transfer_id_transfer_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ID transfer", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0094", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/image_transfer_light_transfer_0002/eval.json b/dataset/image_transfer_light_transfer_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..ed504c7e06b8d4a826d8a3106f3307869232c4f1 --- /dev/null +++ b/dataset/image_transfer_light_transfer_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output generated by the model maintain the overall structure and content of image A while applying the lighting effect of image B?", + "0_point_standard": "The output image significantly alters the main structure or content of image A, deviating from the original depiction.", + "1_point_standard": "The output image retains the overall structure and content of image A, only changing the lighting as described." + }, + { + "question": "Has the lighting effect from image B been accurately transferred to image A without introducing unrelated lighting artifacts?", + "0_point_standard": "The lighting effect in the output image is not similar to the lighting in image B or introduces unrelated lighting artifacts.", + "1_point_standard": "The lighting effect from image B is accurately applied to image A, without introducing unrelated effects." + }, + { + "question": "Aside from the lighting modification, does the model output keep other elements of image A unchanged?", + "0_point_standard": "In the output image, elements of image A other than lighting have been altered or modified.", + "1_point_standard": "In the output image, all elements of image A other than lighting remain unchanged." + }, + { + "question": "If the text description specifies particular lighting details (e.g., direction, intensity), are these details accurately implemented in the output image?", + "0_point_standard": "The output image fails to reflect the specific lighting details mentioned in the text description.", + "1_point_standard": "The output image accurately implements the specific lighting details according to the text description." + }, + { + "question": "Do the shadows and highlights resulting from the transferred lighting effect align with the shapes and surfaces in image A, enhancing the realism of the scene?", + "0_point_standard": "Shadows and highlights appear inconsistent with the shapes or surfaces in image A, leading to an unrealistic or incoherent appearance.", + "1_point_standard": "Shadows and highlights naturally align with the shapes and surfaces in image A, creating a realistic and coherent lighting effect." + }, + { + "question": "Does the output image lack noticeable technical defects such as noise, blur, or artifacts due to the lighting transfer?", + "0_point_standard": "The output image shows noticeable technical defects such as noise, blur, or artifacts due to the lighting transfer.", + "1_point_standard": "The output image is free of technical defects, maintaining high-quality rendering throughout the lighting transfer process." + } + ] +} \ No newline at end of file diff --git a/dataset/image_transfer_light_transfer_0002/images.txt b/dataset/image_transfer_light_transfer_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..ce567976a72f502305082d69c240d738fd7efc10 --- /dev/null +++ b/dataset/image_transfer_light_transfer_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i4/O1CN01aSN08j1kjoSDk1ySj_!!6000000004720-0-tps-5472-3648.jpg +https://img.alicdn.com/imgextra/i2/O1CN01e6xOqW1SWNkvLBpVp_!!6000000002254-0-tps-4000-6000.jpg diff --git a/dataset/image_transfer_light_transfer_0002/instruction.txt b/dataset/image_transfer_light_transfer_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..8a909c125ebf17b9a1af2e69d2e65fae89a2edf3 --- /dev/null +++ b/dataset/image_transfer_light_transfer_0002/instruction.txt @@ -0,0 +1 @@ +Generate an image where the lighting conditions from the second image are transferred to the first image. The key elements of the first image, such as the silhouette of the people and the background, should remain unchanged, but the lighting should be adjusted to match the neon and reflective effects seen in the second image. If slight modifications are needed to achieve the lighting transfer while keeping the main elements intact, make necessary adjustments. \ No newline at end of file diff --git a/dataset/image_transfer_light_transfer_0002/meta.json b/dataset/image_transfer_light_transfer_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3c4aa88012e1bd6b6ec64bf775b24d0a4b6e73fb --- /dev/null +++ b/dataset/image_transfer_light_transfer_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "light transfer", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0093", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/image_transfer_posture_transfer_0002/eval.json b/dataset/image_transfer_posture_transfer_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4c2da0b84774bb854068a37656616a73bc2e2c6f --- /dev/null +++ b/dataset/image_transfer_posture_transfer_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the model output maintain the identity of the character from image A while adopting the pose from image B?", + "0_point_standard": "There is a significant deviation in the identity or recognizable features of the character in the output image compared to the character in image A.", + "1_point_standard": "The character in the output image retains the identity and recognizable features of the character in image A, while adopting the new pose." + }, + { + "question": "Does the character in the output image accurately adopt the pose from image B?", + "0_point_standard": "The character's pose in the output image fails to reflect the pose of the character in image B, with noticeable differences in limb positioning or body orientation.", + "1_point_standard": "The character in the output image accurately reflects the pose of the character in image B, with correct limb positioning and body orientation." + }, + { + "question": "Does the model ensure that the background and other non-character content from image A remain unchanged in the output image?", + "0_point_standard": "There are noticeable changes or inconsistencies in the background or other non-character elements of image A in the output image.", + "1_point_standard": "The background and other non-character content from image A are preserved in the output image without noticeable changes." + }, + { + "question": "Is the transition of the pose from image B to the character in image A logical, maintaining the natural anatomical structure of the character?", + "0_point_standard": "The pose transition results in inconsistent anatomical structure or unnatural body posture, disrupting the logical flow of the character's form.", + "1_point_standard": "The pose transition is logical, maintaining the character's natural anatomical structure and coherent body posture." + }, + { + "question": "Does the output image maintain high-quality rendering, especially in areas where the pose has been modified, ensuring clarity and sharpness?", + "0_point_standard": "The areas of pose modification in the output image are blurred or exhibit rendering artifacts, reducing the overall quality.", + "1_point_standard": "The areas of pose modification are rendered clearly and sharply, maintaining high-quality visual output." + }, + { + "question": "Is the style consistency between input image A and the output image maintained, ensuring a seamless visual transition?", + "0_point_standard": "The output image exhibits style differences or inconsistencies compared to image A, causing a disjointed visual effect.", + "1_point_standard": "The output image maintains the style elements of image A, ensuring a seamless and consistent visual transition." + } + ] +} \ No newline at end of file diff --git a/dataset/image_transfer_posture_transfer_0002/images.txt b/dataset/image_transfer_posture_transfer_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..5f240c0686078abace6545b5581bf25cc1aecfcc --- /dev/null +++ b/dataset/image_transfer_posture_transfer_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i1/O1CN01Kt6CEF1ZWZE5Qcq9Y_!!6000000003202-0-tps-5464-6830.jpg +https://img.alicdn.com/imgextra/i1/O1CN01WBrEgI1MMPRBhe6yz_!!6000000001420-0-tps-3566-2825.jpg diff --git a/dataset/image_transfer_posture_transfer_0002/instruction.txt b/dataset/image_transfer_posture_transfer_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..212ef46ca83ff67f3f4490b83cce12c3eff5973f --- /dev/null +++ b/dataset/image_transfer_posture_transfer_0002/instruction.txt @@ -0,0 +1 @@ +Generate an image where the running pose of the woman from the second image is transferred onto the woman in the first image. Keep the other elements from the first image unchanged, such as the background and lighting, but make subtle adjustments if necessary to ensure a smooth and natural transition of the pose. The woman's outfit, hairstyle, and facial expression should remain consistent with the first image, but she should be depicted in a running posture with fluid and natural body movements. \ No newline at end of file diff --git a/dataset/image_transfer_posture_transfer_0002/meta.json b/dataset/image_transfer_posture_transfer_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b611333abba7637a86087f6804de7683de970735 --- /dev/null +++ b/dataset/image_transfer_posture_transfer_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "posture transfer", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0096", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/interior_design_generation_0001/eval.json b/dataset/interior_design_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e52c3dec19e02e20d7c16989fe782e021d00e102 --- /dev/null +++ b/dataset/interior_design_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image clearly depict an interior space and include recognizable design elements such as furniture, decor, and architectural details?", + "0_point_standard": "The image lacks recognizable interior design elements and does not clearly depict an interior space.", + "1_point_standard": "The image clearly depicts an interior space and includes recognizable design elements such as typical interior design furniture and decor." + }, + { + "question": "Does the image present a realistic and functional room layout that adheres to interior design principles (e.g., appropriate furniture placement, unobstructed pathways)?", + "0_point_standard": "The layout is chaotic or impractical, lacking the functional design principles expected in an interior space.", + "1_point_standard": "The layout is realistic and functional, with furniture and decor arranged in a manner consistent with actual interior design standards." + }, + { + "question": "Does the generated image accurately reflect the specific style, color scheme, or room features described in the text prompt (e.g., minimalist style, neutral tones, kitchen features)?", + "0_point_standard": "The image does not match the specified style, colors, or room features described, deviating from the text requirements.", + "1_point_standard": "The image accurately reflects the specified style, color scheme, and room features described in the text prompt." + }, + { + "question": "Is the lighting in the image realistic, and do shadows and highlights enhance the depth and spatial perception of the interior?", + "0_point_standard": "The lighting appears fake or lacks depth, making the image look unrealistic.", + "1_point_standard": "The lighting is realistic, and shadows and highlights add depth and enhance the spatial perception of the interior." + }, + { + "question": "Are the materials and textures in the image rendered in high quality, and do realistic details reflect the described materials (e.g., wood, fabric, metal)?", + "0_point_standard": "Materials or textures lack clarity or appear fake, reducing the realism of the image.", + "1_point_standard": "Material and texture details are realistic and accurately reflect the characteristics of the described materials." + }, + { + "question": "Does the image exhibit a high level of aesthetic quality with a harmonious color scheme, balanced composition, and professional visual appeal?", + "0_point_standard": "The image lacks aesthetic appeal, with an unharmonious color scheme, poor composition, or unprofessional effect.", + "1_point_standard": "The image exhibits strong aesthetic qualities, with a harmonious color scheme, balanced composition, and a professional, pleasing effect." + } + ] +} \ No newline at end of file diff --git a/dataset/interior_design_generation_0001/images.txt b/dataset/interior_design_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/interior_design_generation_0001/instruction.txt b/dataset/interior_design_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..4d46b277070d260f5ceffa35c50b5005d7b981fd --- /dev/null +++ b/dataset/interior_design_generation_0001/instruction.txt @@ -0,0 +1 @@ +The image depicts a minimalist living room, primarily in off-white and light brown tones, creating a warm and comfortable atmosphere. A beige fabric sofa, with soft and fluffy cushions, is centrally positioned. A low, oval white coffee table sits in front, adorned with fruit, vases, and decorative objects. A light brown armchair with a curved backrest and sleek lines is placed beside the sofa. A potted green plant adds a touch of life to one side. The walls feature three circular wall sconces and a large circular artwork, complementing the vertical wooden slat decor. A uniquely shaped pendant light, resembling a blooming flower, hangs from the ceiling, adding a focal point. A light-colored rug covers the floor, harmonizing with the overall palette. The flooring is light-colored wood, maintaining a unified and harmonious scheme. Sheer off-white curtains allow ample natural light to fill the room. The overall space is simple yet elegant, with refined details, reflecting a modern minimalist design philosophy. \ No newline at end of file diff --git a/dataset/interior_design_generation_0001/meta.json b/dataset/interior_design_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..755637d454dac920f164003279ae65809fdf3914 --- /dev/null +++ b/dataset/interior_design_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "interior design specific effect generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0025", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/layer_decomposition_0001/eval.json b/dataset/layer_decomposition_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..de871b0736bc5172c16b973092c8e22136679142 --- /dev/null +++ b/dataset/layer_decomposition_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output layers match the specified number in the text description, and does each layer correspond to a separate image?", + "0_point_standard": "The number of layers does not match the specification, or some layers are missing.", + "1_point_standard": "The output contains the correct number of layers, and each layer is separated into a single image as specified." + }, + { + "question": "Are the specified elements accurately separated into their respective layers, with each image containing only the specified content?", + "0_point_standard": "Layers contain irrelevant elements or fail to accurately isolate the specified content.", + "1_point_standard": "Each layer accurately isolates the specified elements, with each image containing only the specified content." + }, + { + "question": "Does each layer maintain the integrity of the original content, ensuring that isolated elements are not deformed and retain their visual features?", + "0_point_standard": "Isolated elements are deformed or altered, compromising the visual integrity of the original content.", + "1_point_standard": "Each layer retains the integrity of the original content, with isolated elements appearing undeformed and consistent with the original image." + }, + { + "question": "Are the edges of each isolated element clean and precise, without noticeable artifacts or rough edges?", + "0_point_standard": "Isolated elements have rough or jagged edges, or there are noticeable artifacts, reducing the quality of each layer.", + "1_point_standard": "The edges of each isolated element are clean and precise, with no noticeable artifacts, ensuring high-quality layer separation." + }, + { + "question": "Do the color, texture, and lighting of each layer accurately match the original image, maintaining a consistent visual style across all layers?", + "0_point_standard": "The color, texture, or lighting of the layers does not match the original image, disrupting visual harmony.", + "1_point_standard": "Each layer accurately retains the color, texture, and lighting of the original image, ensuring consistent appearance across all layers." + }, + { + "question": "Does the final set of layered images possess high aesthetic quality and visual clarity, with each layer contributing to a professional and polished presentation?", + "0_point_standard": "The final set of images lacks aesthetic appeal, has low visual clarity, or shows inconsistency, affecting the professional appearance.", + "1_point_standard": "The final set of layered images is visually clear, aesthetically pleasing, and presents a polished and professional effect." + } + ] +} \ No newline at end of file diff --git a/dataset/layer_decomposition_0001/images.txt b/dataset/layer_decomposition_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e29f740d0150d106184397cd6327afc3eb18445c --- /dev/null +++ b/dataset/layer_decomposition_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01UP7Ndz1peqEZ4pwoB_!!6000000005386-0-tps-1022-767.jpg diff --git a/dataset/layer_decomposition_0001/instruction.txt b/dataset/layer_decomposition_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..a7fa1ba39d76776d2de4c471d3e0e8aa04a5048e --- /dev/null +++ b/dataset/layer_decomposition_0001/instruction.txt @@ -0,0 +1 @@ +Please generate 6 images based on the provided input image, where each image represents a specific layer from the original image. The final goal is that if the layers are merged, they should recreate the original image. The 1st image should be the white background layer; the 2nd image should be the yellow cup body layer, excluding the handle and inner surface; the 3rd image should be the yellow handle layer; the 4th image should be the inner surface layer of the cup (the inside of the cup); the 5th image should be the shadows and highlights layer of the cup to represent its light and shadow effects; the 6th image should be the reflection layer, containing the reflective areas on the smooth surface of the cup. Ensure the layers are extracted from the input image, and when merged, they should match the original image. \ No newline at end of file diff --git a/dataset/layer_decomposition_0001/meta.json b/dataset/layer_decomposition_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..aa3bd72310d4eb290e56a4335b53219456fcb914 --- /dev/null +++ b/dataset/layer_decomposition_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "layer decomposition", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0035", + "output_image_count": 6, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/lighting_editing_0002/.DS_Store b/dataset/lighting_editing_0002/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 Binary files /dev/null and b/dataset/lighting_editing_0002/.DS_Store differ diff --git a/dataset/lighting_editing_0002/eval.json b/dataset/lighting_editing_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a57b9c440eeebeb164090e78eea580d6acb3659f --- /dev/null +++ b/dataset/lighting_editing_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the modified image retain the same unchanged areas as the original image, ensuring only the specified lighting conditions have changed?", + "0_point_standard": "Areas not specified in the image show noticeable changes or distortion beyond lighting adjustments.", + "1_point_standard": "Areas not specified in the image remain consistent with the original image, with no changes other than lighting adjustments." + }, + { + "question": "Does the modified image retain the content, style, and features of the original image, maintaining consistency with the input image?", + "0_point_standard": "The modified image shows significant differences in content, style, or features compared to the original image.", + "1_point_standard": "The modified image retains the content, style, and features of the original image, maintaining consistency with the input image." + }, + { + "question": "Does the modified image accurately reflect the lighting changes described in the text input?", + "0_point_standard": "The lighting changes do not conform to the specifications in the text description, showing inaccuracies or deviations.", + "1_point_standard": "The lighting changes have been accurately implemented according to the text description, with no inaccuracies or deviations." + }, + { + "question": "Has the modified image correctly implemented any other lighting effects or modifications specified in the text description?", + "0_point_standard": "Additional lighting effects or modifications specified in the text description are missing or incorrectly applied.", + "1_point_standard": "All additional lighting effects or modifications specified in the text description have been correctly and accurately applied." + }, + { + "question": "Does the lighting edit enhance the visual quality of the image, providing a realistic or aesthetically pleasing effect?", + "0_point_standard": "The lighting edit diminishes the visual quality, making the image appear unrealistic or unattractive.", + "1_point_standard": "The lighting edit enhances the visual quality, providing a realistic and aesthetically pleasing effect." + }, + { + "question": "Has the overall aesthetic of the modified image been improved or maintained, meeting professional visual standards?", + "0_point_standard": "The modified image lacks aesthetic appeal and does not meet professional visual standards.", + "1_point_standard": "The modified image exhibits strong aesthetic appeal, meeting or exceeding professional visual standards." + } + ] +} \ No newline at end of file diff --git a/dataset/lighting_editing_0002/images.txt b/dataset/lighting_editing_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..827c7e533517eba4082ebef0fdca9382e8f09f7a --- /dev/null +++ b/dataset/lighting_editing_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i2/O1CN01d5FCDq1cYo0epblQf_!!6000000003613-0-tps-4000-1500.jpg diff --git a/dataset/lighting_editing_0002/instruction.txt b/dataset/lighting_editing_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..bd0900b59ffda12c7509c64bfd094326b7313a62 --- /dev/null +++ b/dataset/lighting_editing_0002/instruction.txt @@ -0,0 +1 @@ +Adjust the light angle in this image to come from the right side at a horizontal level. The light should pass across the room, casting long shadows and creating a noticeable contrast of light and shadow on the left side and back of the chair. \ No newline at end of file diff --git a/dataset/lighting_editing_0002/meta.json b/dataset/lighting_editing_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..19856deceb39156895239988487f3dc11fb64a62 --- /dev/null +++ b/dataset/lighting_editing_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "lighting editing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0067", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/lighting_effect_simulation_0002/eval.json b/dataset/lighting_effect_simulation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5c952dbc8a887178b2e290596d48d3c7d6f4feaf --- /dev/null +++ b/dataset/lighting_effect_simulation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does each generated image maintain the perspective and scene composition of the original input image, without any changes to layout or object positioning?", + "0_point_standard": "Changes in perspective or object layout between images disrupt scene consistency.", + "1_point_standard": "All images retain their perspective and scene composition, with no changes to object layout or positioning." + }, + { + "question": "Is the specified lighting effect correctly applied in each image according to the description, and are the light sources or conditions accurately represented?", + "0_point_standard": "The lighting effect does not match the specified conditions; errors in light direction, intensity, or source representation.", + "1_point_standard": "The lighting effect is applied exactly as described, accurately reflecting the specified light direction, intensity, and sources." + }, + { + "question": "Do areas outside the specified lighting effects remain unchanged, preserving the original textures and details of the input image?", + "0_point_standard": "Unexpected changes occur in areas that should remain consistent, altering textures or details outside the lighting effects.", + "1_point_standard": "Areas outside the specified lighting effects remain unchanged, preserving the same textures and details as the original image." + }, + { + "question": "Do the images consistently reflect a uniform style across all simulated lighting conditions, with even texture, color grading, and visual tone?", + "0_point_standard": "Differences in style, texture, or color grading between images lead to an inconsistent overall set.", + "1_point_standard": "All images maintain a consistent style, texture, and color grading, presenting a cohesive appearance despite differing lighting conditions." + }, + { + "question": "Does the lighting effect enhance the realism of each image, providing accurate shadows, depth, and reflection quality consistent with the scene's three-dimensional structure?", + "0_point_standard": "The lighting effect appears unrealistic, with inaccurate shadows, inconsistent depth, or poorly handled reflections.", + "1_point_standard": "The lighting effect enhances realism, with accurate shadows, depth, and reflections that closely match the scene's three-dimensional structure." + }, + { + "question": "Does each image retain high-quality details, especially in areas affected by lighting changes, with careful handling of texture, edges, and contrast?", + "0_point_standard": "Details are lost or poorly handled in areas affected by lighting, leading to rough textures or diminished clarity.", + "1_point_standard": "Details are preserved, with well-rendered textures, clear edges, and balanced contrast in areas affected by lighting effects." + } + ] +} \ No newline at end of file diff --git a/dataset/lighting_effect_simulation_0002/images.txt b/dataset/lighting_effect_simulation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..aac37ab06f36caa769d68a6b29a8766bc1e4c977 --- /dev/null +++ b/dataset/lighting_effect_simulation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01kvIFb21JLXqG5G0Sr_!!6000000001012-0-tps-1280-1073.jpg diff --git a/dataset/lighting_effect_simulation_0002/instruction.txt b/dataset/lighting_effect_simulation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..cae04017216b395f6c0e4f5db6d4896ba0c51e78 --- /dev/null +++ b/dataset/lighting_effect_simulation_0002/instruction.txt @@ -0,0 +1 @@ +Please generate five images based on the following description, showcasing different lighting effect variations of the given interior image. The goal is to present different atmospheres and time periods by altering the lighting conditions. The first image should depict soft morning sunlight, gently streaming through the windows, creating a fresh and warm morning ambiance. The second image should display the bright midday sunlight, filling the space and evoking a vibrant and lively atmosphere. The third image should portray dim lighting at midnight, with sparse, soft light to evoke a calm and mysterious nighttime feel. The fourth image should feature warm lighting, with gentle, cozy lights creating a comfortable and homey environment. The fifth image should use cool lighting to give off a bright, modern, and tech-inspired feel. All images should retain the original interior structure while conveying different emotions and visual effects through lighting changes. \ No newline at end of file diff --git a/dataset/lighting_effect_simulation_0002/meta.json b/dataset/lighting_effect_simulation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..699a36e79809d103e4b1498a2c1498e88864ac9f --- /dev/null +++ b/dataset/lighting_effect_simulation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "lighting effect simulation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0033", + "output_image_count": 5, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/local_enlargement_0002/eval.json b/dataset/local_enlargement_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..da68aabbea00ec0e0d6d96916bec881c1e6403ca --- /dev/null +++ b/dataset/local_enlargement_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the enlarged image accurately focus on the specified area and crop out parts outside the designated region as described in the task?", + "0_point_standard": "The enlargement fails to focus solely on the specified area, including unexpected regions or missing parts of the designated area.", + "1_point_standard": "The enlargement accurately focuses on the specified area, excluding all unexpected regions, in accordance with instructions." + }, + { + "question": "Is the resolution and sharpness of the enlarged area retained, ensuring clear and sharp details?", + "0_point_standard": "The enlarged area appears blurry or pixelated with noticeable loss of detail or sharpness.", + "1_point_standard": "The enlarged area retains high resolution and clarity, with sharp and clear details." + }, + { + "question": "Does the enlarged image maintain relevance in content and style with the input image, ensuring consistency?", + "0_point_standard": "The enlarged image shows inconsistency in content or style, deviating from the characteristics of the original image.", + "1_point_standard": "The enlarged image maintains consistent content and style relevance with the original image, accurately reflecting its characteristics." + }, + { + "question": "Does the enlargement meet the requirements of the text description, such as specific details mentioned in the task (e.g., focus, orientation)?", + "0_point_standard": "The enlargement fails to include specific details or instructions mentioned in the text description.", + "1_point_standard": "The enlargement successfully includes all specific details and instructions listed in the text description." + }, + { + "question": "Are the edges of the enlarged area smoothly transitioned, avoiding abrupt changes or artifacts that disrupt the natural appearance of the image?", + "0_point_standard": "The edges of the enlarged area show noticeable artifacts or abrupt changes, causing a discontinuous or unnatural appearance.", + "1_point_standard": "The edges of the enlarged area smoothly transition, naturally blending with the surrounding image, presenting a cohesive appearance." + }, + { + "question": "Does the enlarged image possess high aesthetic appeal with pleasing composition and enhanced focus area?", + "0_point_standard": "The enlarged image lacks aesthetic appeal, with poor composition or an unappealing focus area.", + "1_point_standard": "The enlarged image exhibits strong aesthetic appeal, with pleasing composition and enhanced focus area." + } + ] +} \ No newline at end of file diff --git a/dataset/local_enlargement_0002/images.txt b/dataset/local_enlargement_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..3dd23b7a5f8458fa3563587c03d6dd191cd5b4aa --- /dev/null +++ b/dataset/local_enlargement_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01dH1dDV29ugFKkpemb_!!6000000008128-0-tps-1200-900.jpg diff --git a/dataset/local_enlargement_0002/instruction.txt b/dataset/local_enlargement_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..a23d8f8e670c2a06d5e6889ba70d9d115862365b --- /dev/null +++ b/dataset/local_enlargement_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a zoomed-in image of the ship based on the input picture, cropping out the rest of the image. During the zooming process, it may be necessary to supplement some additional details to ensure the image quality and realism. The final image should maintain the same style as the original, ensuring the ship's clarity and rich detail, making it feel like a high-quality close-up image. \ No newline at end of file diff --git a/dataset/local_enlargement_0002/meta.json b/dataset/local_enlargement_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3126651397ef79fff539b1e7d8d456b74d83bf1f --- /dev/null +++ b/dataset/local_enlargement_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "local enlargement", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0051", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/local_enlargement_0003/eval.json b/dataset/local_enlargement_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4f3322f00f39c0e5f72af6785b2860f0b30075c6 --- /dev/null +++ b/dataset/local_enlargement_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the magnified image accurately focus on the specified area and crop out parts outside the specified area as described in the task?", + "0_point_standard": "The magnification fails to focus solely on the specified area, includes unexpected areas, or lacks parts of the specified area.", + "1_point_standard": "The magnification accurately focuses on the specified area, excluding all unexpected areas, as instructed." + }, + { + "question": "Is the resolution and clarity of the magnified area retained, ensuring details are sharp and clear?", + "0_point_standard": "The magnified area appears blurry or pixelated, with a noticeable loss of detail or clarity.", + "1_point_standard": "The magnified area maintains high resolution and clarity, with sharp and clear details." + }, + { + "question": "Does the magnified image maintain relevance in content and style with the input image, ensuring consistency?", + "0_point_standard": "The magnified image shows inconsistency in content or style, deviating from the characteristics of the original image.", + "1_point_standard": "The magnified image maintains consistent content and style relevance with the original image, accurately reflecting its characteristics." + }, + { + "question": "Does the magnification meet the requirements of the text description, such as specific details mentioned in the task (e.g., focus, orientation)?", + "0_point_standard": "The magnification fails to include specific details or instructions mentioned in the text description.", + "1_point_standard": "The magnification successfully includes all specific details and instructions listed in the text description." + }, + { + "question": "Are the edges of the magnified area smoothly transitioned, avoiding abrupt changes or artifacts that disrupt the natural appearance of the image?", + "0_point_standard": "The edges of the magnified area have noticeable artifacts or abrupt changes, resulting in a disjointed or unnatural appearance.", + "1_point_standard": "The edges of the magnified area transition smoothly, blending naturally with the surrounding image, presenting a cohesive appearance." + }, + { + "question": "Does the magnified image possess a high aesthetic appeal, featuring a pleasing composition and enhanced focus area?", + "0_point_standard": "The magnified image lacks aesthetic appeal, with poor composition or an unattractive focus area.", + "1_point_standard": "The magnified image exhibits strong aesthetic appeal, with a pleasing composition and an enhanced focus area." + } + ] +} \ No newline at end of file diff --git a/dataset/local_enlargement_0003/images.txt b/dataset/local_enlargement_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..dc3beeed333b58afa2552d2b31b89bc791a795db --- /dev/null +++ b/dataset/local_enlargement_0003/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01kBTCFL1z75CLRTZ0K_!!6000000006666-0-tps-1280-1065.jpg diff --git a/dataset/local_enlargement_0003/instruction.txt b/dataset/local_enlargement_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..51933ae155a25358d189734ee99fa21f2791e26c --- /dev/null +++ b/dataset/local_enlargement_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a zoomed-in image of the bee based on the input picture, cropping out the rest of the image. During the zooming process, it may be necessary to supplement some additional details to ensure the image quality and realism. The final image should maintain the same style as the original, ensuring the bee's clarity and rich detail, making it feel like a high-quality close-up image. \ No newline at end of file diff --git a/dataset/local_enlargement_0003/meta.json b/dataset/local_enlargement_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..d98538fb1dcd178ec6f8ea6ba3b3fa1c426644f4 --- /dev/null +++ b/dataset/local_enlargement_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "local enlargement", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0051", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_0001/eval.json b/dataset/movie_shots_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..17b9eb7e7059a93be6cb156c154b97338e464760 --- /dev/null +++ b/dataset/movie_shots_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sequence of images logically present the events in the script in chronological order?", + "0_point_standard": "The sequence of images is not arranged in chronological order, or lacks a logical flow, failing to show the progression of events.", + "1_point_standard": "The sequence of images clearly presents the events in the script in a logical chronological order." + }, + { + "question": "Do the contents of the images accurately reflect the scenes described in the script?", + "0_point_standard": "The content of the images does not accurately reflect the scenes described in the script, with obvious discrepancies.", + "1_point_standard": "The content of the images matches the script completely, accurately depicting the specified scenes." + }, + { + "question": "Is the style and overall visual effect of the storyboard images consistent throughout the sequence?", + "0_point_standard": "The style of the storyboard images is inconsistent, leading to a disjointed visual effect.", + "1_point_standard": "All storyboard images maintain a consistent style, creating a cohesive visual effect." + }, + { + "question": "Does the generated storyboard maintain consistency in the same object or character IDs (e.g., the same character or object is recognizable across different images)?", + "0_point_standard": "The main characters or objects are inconsistent across different frames, making it difficult to recognize them as the same.", + "1_point_standard": "The main characters or objects are consistent and clearly identifiable throughout the storyboard." + }, + { + "question": "Considering the context of the script, is the logical presentation of the scenes reasonable?", + "0_point_standard": "The presentation of the scenes is illogical or unreasonable, with obvious errors or unrealistic depictions.", + "1_point_standard": "The presentation of the scenes is logical, reasonable, and reflects the intended context of the script." + }, + { + "question": "Do the details and aesthetic appeal of the storyboard images meet professional standards and have visual appeal?", + "0_point_standard": "The storyboard images lack detail and aesthetic appeal, falling short of visual standards.", + "1_point_standard": "The storyboard images are rich in detail, have excellent aesthetic appeal, meet professional standards, and are visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_0001/images.txt b/dataset/movie_shots_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/movie_shots_generation_0001/instruction.txt b/dataset/movie_shots_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..5f91f409d4763ef00c4ddf49c5a7b0318b00710c --- /dev/null +++ b/dataset/movie_shots_generation_0001/instruction.txt @@ -0,0 +1,17 @@ +Please generate a set of storyboard images based on the provided script, with each frame corresponding to one image. The appearance of the same characters should remain consistent across different storyboard frames, as well as the overall style. + +**Background Synopsis:** In the year 2077, scientist Jack, agent Zax, and historian Irene form a special squad tasked with tracking and capturing a time criminal known as "The Shadow." He has committed serious crimes at different temporal nodes. Using a time machine created with future technology, they successfully jump to the year 1945, deciding to take refuge in a bar to lay low for a while. In the bar, Irene begins to search for traces of "The Shadow" with her futuristic device. Unbeknownst to them, "The Shadow" is already aware of their arrival and is quietly preparing to strike from a corner of the bar. + +| Shot No. | Location | Shot Size | Camera Angle | Description | Dubbing Content | Subtitles | Music/SFX | Camera/Technique | +|----------|----------|-----------|--------------|----------------------------------------------------------------|------------------|----------|----------------|-------------------| +| 1A | Underground Laboratory, 2077 | Wide Shot | Eye Level | The lab is bathed in blue light, with multiple screens on the walls flashing with various data and charts. The trio, wearing futuristic white suits, stand in front of a silver, circular time machine. Various high-tech instruments are scattered around. | Jack: “Are we ready?” | "Are we ready?" | Tense music | Objective shot, static | +| 1B | Underground Laboratory, 2077 | Close-up | Eye Level | Close-up of Jack's face, his skin pale, and his deep blue eyes revealing determination and nervousness. | - | - | - | Push in | +| 2A | Street, 1945 | Extreme Long Shot | High Angle | A busy old-fashioned street lined with red brick buildings and short trees. Pedestrians in 1940s attire, men in hats and women in long skirts. Suddenly, a bright halo bursts in the middle of the street, and the trio appears. | - | - | Time travel SFX | Subjective shot, flying over the street | +| 2B | Street, 1945 | Medium Close-up | Eye Level | The trio stand somewhat confused, Zax wearing a futuristic helmet, Jack and Irene with special glasses. Their attire starkly contrasts with the crowd around them. | Zax: “Where…where are we?” | "Where...where are we?" | Sounds of passing vehicles, horse hooves | Static | +| 3A | Street, 1945 | Wide Shot | Low Angle | The trio at the roadside, surrounded by curious onlookers including an old man smoking, two women with shopping bags, and several uniformed soldiers. Jack points towards a bar named “Retro's” with warm lighting in the twilight. | Irene: “Don't draw attention. Over there!” | "Don't draw attention. Over there!" | Murmurs of the crowd | Tracking shot, following the trio to the bar | +| 4A | Inside the Bar | Wide Shot | Eye Level | The interior of the bar has an old-fashioned decor, with a wooden bar and black and white photos on the wall giving it an air of bygone days. People are seated around round tables, chatting and laughing. The trio sit in a corner, Irene pulls out a small device with a touchscreen and blue lights from her futuristic backpack. | Irene (whispers): “Start searching.” | "Start searching." | 1940s jazz music, bar noise | Pull focus, from the bar entrance to the trio | +| 4B | Inside the Bar | Close-up | Eye Level | The device's touchscreen displays a map with a flashing red dot and the text “The Shadow.” There are futuristic characters along the edge of the screen. | - | - | Beeping of the device | Static | +| 5A | Dark Corner of the Bar | Medium Close-up | High Angle | In a dark corner of the bar, a man in a black coat, the brim of his hat obscuring his eyes. He quietly takes out an old-fashioned pistol from his coat pocket, silver-plated with special engravings. | - | - | Cocking of the pistol, low tense music | Pan shot, blurry to clear | +| 5B | Inside the Bar | Wide Shot | Eye Level | Jack and Zax quickly stand up, Jack pulling out a shimmering badge from his pocket, Zax preparing to charge at “The Shadow.” The other patrons are stunned by this sudden action. | Jack: “Now!” | "Now!" | Fast-paced fight music | Push in, following Jack and Zax charging at “The Shadow” | +| 6A | Inside the Bar | Medium Close-up | High Angle | Zax's hand reaches swiftly for “The Shadow's” pistol but is hit by an elbow strike. His helmet falls off, revealing tousled blond hair. Glasses and bottles nearby are knocked over and splashed. | Zax (in pain): “Ahh!” | "Ahh!" | Pained scream | Shaky cam, capturing the action | +| 6B | Inside the Bar | Close-up | Eye Level | Irene quickly opens her backpack and pulls out a device the size of an orange, its casing transparent, with liquid flowing inside and emitting bright blue light. | Irene: “Come back!” | "Come back!" | High-pitched noise from the device | POV shot, device flying towards “The Shadow” | diff --git a/dataset/movie_shots_generation_0001/meta.json b/dataset/movie_shots_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..416b4783b832a890ff1e900bc2d298938a9248bd --- /dev/null +++ b/dataset/movie_shots_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "movie shots generation without reference", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0013", + "output_image_count": 11, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_character_definition_0002/eval.json b/dataset/movie_shots_generation_character_definition_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..df026e5b34cc63e5dc282be6cf976557603502b2 --- /dev/null +++ b/dataset/movie_shots_generation_character_definition_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Are the characters in the storyboard clearly derived from the provided character definition sheet?", + "0_point_standard": "The characters in the storyboard are dissimilar or unrecognizable compared to the provided character definition sheet.", + "1_point_standard": "The characters in the storyboard are clearly derived from the provided character definition sheet and can be identified based on it." + }, + { + "question": "Are any modifications to the characters or scenes limited to the specified requirements, and do unchanged elements remain consistent with the input?", + "0_point_standard": "Unintended modifications or distortions are present in parts of the image not meant to be altered, affecting overall consistency.", + "1_point_standard": "Modifications are limited to specified areas, with other elements remaining consistent and unchanged." + }, + { + "question": "Does the storyboard accurately reflect the key elements and instructions given in the text description (e.g., location, emotion, action)?", + "0_point_standard": "The storyboard fails to capture the key elements or instructions from the text description, resulting in incoherent or inaccurate presentation.", + "1_point_standard": "The storyboard accurately reflects the key elements and instructions, effectively capturing the expected scene and narrative from the text description." + }, + { + "question": "Does the generated storyboard maintain temporal and logical consistency throughout the image sequence?", + "0_point_standard": "The sequence's chronological order or logical flow is disrupted, with images appearing out of order or lacking narrative coherence.", + "1_point_standard": "The storyboard maintains a clear and logical chronological order, with a coherent narrative flow consistent with the story's progression." + }, + { + "question": "Do the images in the storyboard maintain a consistent style and artistic coherence suitable for a professional film storyboard?", + "0_point_standard": "The image style in the storyboard is inconsistent, resulting in a visually disjointed or unprofessional appearance.", + "1_point_standard": "The images maintain a consistent style and artistic coherence, forming a professional and visually appealing storyboard." + }, + { + "question": "Are characters in the storyboard consistently recognizable as the same individual throughout the sequence?", + "0_point_standard": "Characters appear inconsistently or unrecognizable across different images, making it difficult to determine they are the same person.", + "1_point_standard": "Characters are consistently recognizable as the same individual throughout the storyboard, ensuring continuity and clarity." + } + ] +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_character_definition_0002/images.txt b/dataset/movie_shots_generation_character_definition_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..448c4e4d901f929a029b9a55cd18ada52f78a3c7 --- /dev/null +++ b/dataset/movie_shots_generation_character_definition_0002/images.txt @@ -0,0 +1,5 @@ +https://img.alicdn.com/imgextra/i3/O1CN01ITSR8v1Zagn7hRhlw_!!6000000003211-0-tps-1280-986.jpg +https://img.alicdn.com/imgextra/i4/O1CN01AH4j5m26LvLIkl232_!!6000000007646-0-tps-1280-986.jpg +https://img.alicdn.com/imgextra/i3/O1CN01DEKIAr1NY9CgfiTro_!!6000000001581-0-tps-1280-1025.jpg +https://img.alicdn.com/imgextra/i4/O1CN01CSVy2O22N3rZTuBW3_!!6000000007107-0-tps-1280-1283.jpg +https://img.alicdn.com/imgextra/i1/O1CN01wLT1NZ29oGhageq4L_!!6000000008114-0-tps-1280-1283.jpg diff --git a/dataset/movie_shots_generation_character_definition_0002/instruction.txt b/dataset/movie_shots_generation_character_definition_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..de0b736bb4283eb91457da7fcfe1266b68a241c8 --- /dev/null +++ b/dataset/movie_shots_generation_character_definition_0002/instruction.txt @@ -0,0 +1,17 @@ +Please generate a set of storyboard images based on the provided script, with each storyboard frame corresponding to one image. The character definitions in the storyboard must be based on the provided images, ensuring that the characters in the generated images closely match the predefined ones. + +### Story Overview: +In the future, set in a deep-sea world, six highly advanced members of the “Ripper Squad” are tasked with a dangerous mission: to destroy a secret underground bioweapon facility run by a criminal organization. Each member possesses unique abilities, and together they must face powerful enemies in the dark, mysterious depths of the ocean. The story is filled with intense battles, high-stakes moments, and themes of teamwork and sacrifice. + +### English Script + +| Shot No. | Location | Shot Size | Camera Angle | Description | Voiceover | Subtitles | Music/SFX | Camera Movement | +| -------- | -------- | --------- | ------------ | ----------- | --------- | --------- | ---------- | --------------- | +| 1 | Outside the Deep Sea Base | Wide shot | From the angle of an underwater cave | The Ripper Squad's submarine slowly advances through the deep ocean, with the silhouette of the bioweapons facility in the distance. | None | The ominous atmosphere of the deep sea intensifies. | Tense, low background music, sound of flowing water | Fixed camera, zooming in gradually | +| 2 | Inside the Submarine | Medium shot | From behind the captain's seat | Captain Zack communicates with the team through the intercom, preparing for the mission. Team members ready themselves in their battle stations. | Zack: "Get ready, the target is approaching." | Captain Zack preparing for the battle | Low hum of internal equipment | Handheld camera, slight shake | +| 3 | Base Entrance | Close-up | From the base entrance view | Lucas and Bob work together to breach the base entrance, Lucas cutting the metal door with a laser gun, while Bob prepares an explosive. | Lucas: "Three, two, one...go!" | Lucas and Bob breaching the entrance | High-pitched alarm blares | Quick cut to the action of breaching | +| 4 | Inside the Base | Wide shot | Overhead angle | The whole squad enters the base, with dim lighting and faint blue glows from the equipment. Issca scouts the area ahead. | Issca: "The area is clear." | The Ripper Squad cautiously proceeds. | Heavy atmosphere music | High overhead shot, slowly following the squad | +| 5 | Laboratory Entrance | Medium shot | Side angle | The squad halts at the lab entrance, and K2 discovers activated defense systems. The team quickly sets up countermeasures. | K2: "There's a trap, prepare for the defense systems." | The squad prepares for defense. | Beeping of electronic devices | Crane shot, slowly approaching the defense system | +| 6 | Inside the Laboratory | Close-up | From the bioweapon's perspective | The team enters the lab, facing a massive bioweapon stored in tubes. K2 analyzes its weak points, and Zack orders the team to attack. | K2: "The bioweapon's core is ahead." Zack: "Get ready to engage!" | Critical moment, the team starts the attack. | Tense drum beats | Quick cuts to the team members preparing for action | +| 7 | Deeper in the Lab | Medium shot | Side angle | As the bioweapon retaliates, Lucas and Zack fight side by side while Bob covers the retreat. Issca uses his scanning abilities to locate the enemy's weak point. | Zack: "Focus fire, hit the core!" | The bioweapon fights back, and the battle begins. | Intense battle sounds | Fast-moving handheld camera, portraying the intensity of the fight | +| 8 | Base Collapse | Wide shot | From a distant view of the base exterior | As the bioweapon is destroyed, the base begins to collapse. The squad quickly retreats, and the submarine escapes the seabed amidst the explosions. | None | The base collapses, and the squad successfully escapes. | Huge rumbling of the base explosion | High-speed diving shot, showing the explosion and escape | diff --git a/dataset/movie_shots_generation_character_definition_0002/meta.json b/dataset/movie_shots_generation_character_definition_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..4047974e5bfad6ed9e5d1a2242ca9d2ee0d877bf --- /dev/null +++ b/dataset/movie_shots_generation_character_definition_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "movie shots generation given character definition", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0040", + "output_image_count": 8, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_scene_definition_0002/eval.json b/dataset/movie_shots_generation_scene_definition_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..8cd0b616241d5b337a99c4e545c3cdd852626cbe --- /dev/null +++ b/dataset/movie_shots_generation_scene_definition_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Are the contents and elements in the generated storyboard directly related to the input scene definition diagram?", + "0_point_standard": "The elements and contents of the storyboard do not have a clear relation to the input scene definition diagram, with obvious deviations in characters, scenes, or actions.", + "1_point_standard": "The storyboard accurately reflects the content of the input scene definition diagram, with all major elements and actions correctly represented." + }, + { + "question": "After any specified modifications, do the other parts of the storyboard remain unchanged and consistent with the original scene definition?", + "0_point_standard": "Parts of the storyboard that were not specified for modification show changes that should not have occurred, affecting the integrity of the scene.", + "1_point_standard": "The parts of the storyboard not specified for modification remain consistent with the original scene definition, with no unnecessary changes." + }, + { + "question": "Does the generated storyboard follow the specific instructions and requirements listed in the text description?", + "0_point_standard": "The storyboard fails to include specific instructions from the text description, missing key directives such as character actions or scene details.", + "1_point_standard": "The storyboard effectively includes the specific instructions from the text description, accurately depicting the required actions and details." + }, + { + "question": "Is the temporal logic of the storyboard coherent and consistent with the sequence and flow described in the text input?", + "0_point_standard": "The storyboard lacks temporal logic, with scenes appearing in a disordered sequence or lacking clear narrative progression.", + "1_point_standard": "The storyboard follows a coherent temporal sequence, consistent with the narrative flow described in the text input." + }, + { + "question": "Do the visual and stylistic elements of the storyboard maintain a consistent aesthetic quality and style overall?", + "0_point_standard": "The storyboard is inconsistent in visual style or aesthetic quality, with noticeable differences in the design or tone of different scenes.", + "1_point_standard": "The storyboard maintains a consistent visual style and aesthetic quality overall, providing a unified and coherent visual experience." + }, + { + "question": "Is the depiction of characters and objects consistent, ensuring they are recognizable and maintain identity throughout the storyboard?", + "0_point_standard": "The depiction of characters or objects is inconsistent, with changes in appearance or features leading to them being unrecognizable in different scenes.", + "1_point_standard": "Characters and objects are depicted consistently throughout the storyboard, retaining recognizable features and identity in all scenes." + } + ] +} \ No newline at end of file diff --git a/dataset/movie_shots_generation_scene_definition_0002/images.txt b/dataset/movie_shots_generation_scene_definition_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..b65e111c4d50f2ce40f0309e822d0d068fd5101d --- /dev/null +++ b/dataset/movie_shots_generation_scene_definition_0002/images.txt @@ -0,0 +1,5 @@ +https://img.alicdn.com/imgextra/i3/O1CN014DY27D1nDhX4IbMLd_!!6000000005056-0-tps-4498-2023.jpg +https://img.alicdn.com/imgextra/i4/O1CN01e7GLSK1f6MFxSFVU3_!!6000000003957-0-tps-5622-2528.jpg +https://img.alicdn.com/imgextra/i1/O1CN01E2hQaS20TM4fBRZIH_!!6000000006850-0-tps-4498-2023.jpg +https://img.alicdn.com/imgextra/i3/O1CN01wQ9LVD1MLUdwRk6NO_!!6000000001418-0-tps-5622-2528.jpg +https://img.alicdn.com/imgextra/i4/O1CN013kgXHx1LPKLS9Xhvx_!!6000000001291-0-tps-4498-2023.jpg diff --git a/dataset/movie_shots_generation_scene_definition_0002/instruction.txt b/dataset/movie_shots_generation_scene_definition_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..822bc03d35ce22db7ca3099d4808c4df384b55ad --- /dev/null +++ b/dataset/movie_shots_generation_scene_definition_0002/instruction.txt @@ -0,0 +1,16 @@ +## Background Overview + +The story takes place in an old mansion shrouded in mystery. On a rainy night, the protagonist enters the mansion alone, searching for clues about their family's secrets. As they delve deeper, they uncover the mansion's hidden history and realize that a supernatural force has been pulling the strings all along. The film has a Gothic horror tone, with a strong emphasis on dark, moody lighting. The music and sound effects should enhance the eerie and oppressive atmosphere. + +## Scene Details + +| Shot No. | Location | Framing | Camera Angle | Description | Voiceover | Subtitle | Music/SFX | Camera Position/Movement | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | +| 1 | Mansion Entrance Hall (Scene 1) | Wide Shot | Low Angle | The protagonist pushes open the heavy wooden door and walks into the dimly lit hall. The staircase in the distance is outlined by faint light, and dust floats in the air. | "This mansion has been abandoned for years. No one knows what secrets lie within." | "The mansion, untouched for years, holds ancient secrets." | Dark background music accompanied by the creaking sound of the door. | Stable shot, slowly following the protagonist into the hall. | +| 2 | Bottom of the Staircase (Scene 2) | Medium Shot | Frontal Angle | The protagonist stands at the bottom of the staircase, looking up at the top. Light casts a shadow through the banister. | "Maybe the answers are upstairs, but I feel like something is watching me." | "He feels as if something is watching from the darkness." | Low, rumbling background music interrupted by a faint rustle. | Stable shot, tilting upward as the protagonist looks up. | +| 3 | Middle of the Staircase (Scene 3) | Close-Up | Upward Angle | The protagonist ascends the stairs step by step, each one accompanied by the creaking of the wooden floor, as the light gradually dims. | "This staircase seems to go on forever." | "At the top of the stairs, something unknown awaits." | Background music intensifies, with the protagonist's footsteps clearly audible. | Stable shot from the bottom of the stairs, looking up. | +| 4 | Balcony (Scene 4) | Wide Shot | High Angle | The protagonist steps onto the balcony, where rain gently taps against the railing. The distant city lights flicker faintly. | "Everything looks so calm, yet the unease in my heart grows stronger." | "The outside world is calm, but the mansion holds an eerie atmosphere." | Soft background music accompanied by the sound of rain. | Camera slowly moves downward from a high angle, showing the protagonist and surroundings. | +| 5 | Top of the Stairs (Scene 2) | Medium Shot | Side Angle | The protagonist returns to the top of the stairs, looking ahead at a half-open door, through which faint light is seeping. | "Perhaps what I'm looking for lies behind that door." | "Behind that door, the truth may be hidden." | Tense background music, with faint sounds coming from behind the door. | Steady camera, slowly zooming toward the door. | +| 6 | Room Behind the Door (Scene 5) | Close-Up | Low Angle | The protagonist pushes the door open, revealing a room filled with old, dilapidated furniture. The paintings on the walls are faded from age. | "This room holds secrets forgotten by time." | "Every corner of this room whispers of history long forgotten." | Deep, rumbling background music with faint creaks from the wooden floor. | The camera follows the protagonist into the room, circling the surroundings. | +| 7 | Inside the Room (Scene 5) | Close-Up | Frontal Angle | The protagonist finds an old letter. Upon opening it, they discover the secrets of their family history. | "This is the clue I've been searching for." | "The letter reveals the mysterious past of the family." | Low, subtle sound effects accompanied by the rustling of paper. | The camera focuses on the letter, slowly zooming in as the protagonist opens it. | +| 8 | Balcony (Scene 4) | Wide Shot | High Angle | The protagonist returns to the balcony. The rain still falls, but they now understand the mansion's secrets, their gaze determined. | "The truth is finally revealed, but at a great cost." | "He now knows the truth, but what comes next is uncertain." | Tense music transitions into calm, with faint thunder in the background. | The camera slowly pulls away, ending with a wide shot of the mansion. | diff --git a/dataset/movie_shots_generation_scene_definition_0002/meta.json b/dataset/movie_shots_generation_scene_definition_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..4ca68cb778e4a0c434797beaaaecd87cd3cd3c9e --- /dev/null +++ b/dataset/movie_shots_generation_scene_definition_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "movie shots generation given scene definition", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0041", + "output_image_count": 8, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/multi-interior_decoration_variants_generation_0002/eval.json b/dataset/multi-interior_decoration_variants_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..137fe49cdcc13d353c3ded96e40aa5dc4df721cf --- /dev/null +++ b/dataset/multi-interior_decoration_variants_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does each generated image reflect a different style as specified in the text description, with each image representing a unique interior design style?", + "0_point_standard": "The generated images do not clearly reflect the specified different styles, as the styles appear similar or inconsistent with the text description.", + "1_point_standard": "Each generated image clearly reflects the unique style specified in the text description, representing different interior design aesthetics." + }, + { + "question": "Are the room structure, camera angles, and furniture layout consistent across all images, ensuring only the style elements change?", + "0_point_standard": "There are noticeable variations in room structure, camera angles, or furniture layout between images, deviating from the specified requirements.", + "1_point_standard": "The room structure, camera angles, and furniture layout remain consistent across all images, with only style changes applied." + }, + { + "question": "Are the details of specific styles (such as color schemes, textures, and decorative elements) accurately adjusted according to the defined styles in each image?", + "0_point_standard": "The details of specific styles are inaccurately expressed or do not match the expected characteristics of each style.", + "1_point_standard": "Each image accurately reflects the details of specific styles, such as colors, textures, and decorative elements, consistent with the defined styles." + }, + { + "question": "Does the lighting and ambiance in each image correspond to the atmosphere typically associated with each style (e.g., warm lighting for rustic style, bright and airy for minimalist)?", + "0_point_standard": "The lighting and ambiance do not match the expected atmosphere of each style, leading to a mismatch between style and ambiance.", + "1_point_standard": "The lighting and ambiance in each image are carefully adjusted to match the atmosphere and characteristics of each specified style." + }, + { + "question": "Are the materials and finishes of furniture and fixtures appropriately adapted to each style, enhancing the authenticity of the design?", + "0_point_standard": "The materials and finishes do not align with the expected style, reducing the authenticity of the design in each image.", + "1_point_standard": "The materials and finishes for each style are thoughtfully chosen to accurately reflect the aesthetic of the expected design and enhance authenticity." + }, + { + "question": "Does the final image set exhibit a high level of aesthetic quality, with each style variation contributing to a cohesive and professional-looking series?", + "0_point_standard": "The final image set lacks aesthetic quality, with inconsistencies or low quality diminishing the professional appearance of the series.", + "1_point_standard": "The final image set exhibits high aesthetic quality, with each style variation contributing to a cohesive, visually appealing, and professional series." + } + ] +} \ No newline at end of file diff --git a/dataset/multi-interior_decoration_variants_generation_0002/images.txt b/dataset/multi-interior_decoration_variants_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..06189068345b3279b6892fc5ea619def1886d24f --- /dev/null +++ b/dataset/multi-interior_decoration_variants_generation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01RHfV9X1Zy2zUa7S09_!!6000000003262-0-tps-1280-1600.jpg diff --git a/dataset/multi-interior_decoration_variants_generation_0002/instruction.txt b/dataset/multi-interior_decoration_variants_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..62e62056fdcf6a9bc342a1aa6ae81b23df1d81d9 --- /dev/null +++ b/dataset/multi-interior_decoration_variants_generation_0002/instruction.txt @@ -0,0 +1 @@ +Please generate five images based on the following description, showcasing different style variations of the given kitchen image. The goal is to create diverse transformations of the original kitchen decor, including design styles, material styles, and other commonly used interior styles. The first image should feature a modern minimalist style, using white and gray elements along with minimalist furniture designs and clean lines. The second image should depict an industrial style, incorporating exposed metal, brick walls, and industrial lighting to create a rugged and functional look. The third image should present a Scandinavian style, using light wood, soft lighting, and simple decor to create a warm and cozy atmosphere. The fourth image should showcase a vintage style, with dark wood, retro furniture, and warm lighting to evoke a nostalgic feeling. The fifth image should represent a Japanese Zen style, emphasizing natural materials such as bamboo and stone, combined with simple lines and natural lighting to create a tranquil ambiance. All images should maintain the original kitchen layout while reflecting the diversity of these distinct design styles. \ No newline at end of file diff --git a/dataset/multi-interior_decoration_variants_generation_0002/meta.json b/dataset/multi-interior_decoration_variants_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..5c9ee14dac27fc4bf3c4ffe35c6dcc9468101ce1 --- /dev/null +++ b/dataset/multi-interior_decoration_variants_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "interior design multi-style variant generation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0038", + "output_image_count": 5, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/object_editing_object_removal_0001/eval.json b/dataset/object_editing_object_removal_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..81493d840fc07d7c92285b7bac1500d27ea42f53 --- /dev/null +++ b/dataset/object_editing_object_removal_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Has the specified object been completely removed from the image without any visible remnants?", + "0_point_standard": "The object is still partially visible, or there are obvious remnants or traces left after removal.", + "1_point_standard": "The object has been completely removed without any visible remnants or traces." + }, + { + "question": "Apart from the specified object, does the rest of the image remain unchanged and consistent with the original image?", + "0_point_standard": "The rest of the image shows noticeable changes or modifications unrelated to the object removal.", + "1_point_standard": "Apart from the removal of the specified object, the rest of the image remains unchanged and consistent with the original image." + }, + { + "question": "Does the object removal maintain the overall content and style of the original image, ensuring consistency between the input and output?", + "0_point_standard": "The removal operation disrupts the overall content or style, resulting in an incoherent or inconsistent image.", + "1_point_standard": "The content and style are maintained, ensuring consistency between the input and output images." + }, + { + "question": "Does the image meet any specific requirements or instructions provided in the text description, such as naturally filling the removed area?", + "0_point_standard": "The specific requirements or instructions in the text description are not met or not well executed.", + "1_point_standard": "The image accurately and effectively meets the specific requirements or instructions in the text description." + }, + { + "question": "Is the quality of the fill in the area where the object was removed consistent with the surrounding area, including texture, color, and lighting?", + "0_point_standard": "The fill area is inconsistent with the surrounding area, with noticeable differences in texture, color, or lighting.", + "1_point_standard": "The fill area seamlessly blends with the surrounding area, with consistent texture, color, and lighting." + }, + { + "question": "Does the modified image have an overall aesthetic appeal, maintaining a high level of visual quality and attractiveness?", + "0_point_standard": "The modified image lacks aesthetic appeal and visual quality, appearing unprofessional or unattractive.", + "1_point_standard": "The modified image exhibits strong aesthetic appeal, maintaining a high level of visual quality and attractiveness." + } + ] +} \ No newline at end of file diff --git a/dataset/object_editing_object_removal_0001/images.txt b/dataset/object_editing_object_removal_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..b912b0eaa4a06d736e26dd5c9975ed906576a9de --- /dev/null +++ b/dataset/object_editing_object_removal_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01jAHW9s23MtKgtrFKs_!!6000000007242-0-tps-5152-7728.jpg diff --git a/dataset/object_editing_object_removal_0001/instruction.txt b/dataset/object_editing_object_removal_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..fd96269a91a1521b779f14efcb7f8197132d5060 --- /dev/null +++ b/dataset/object_editing_object_removal_0001/instruction.txt @@ -0,0 +1 @@ +Please remove the blue flower pot in the foreground of the image. The goal is to keep the background buildings, shadows, and lighting unchanged, and fill in the area after the flower pot is removed, ensuring the scene remains complete and natural. The resulting image should blend seamlessly with the original. \ No newline at end of file diff --git a/dataset/object_editing_object_removal_0001/meta.json b/dataset/object_editing_object_removal_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..44f4ab38bf3cfd05782d50885e8170e7256ec8ef --- /dev/null +++ b/dataset/object_editing_object_removal_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "object removal", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0056", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/object_editing_object_replacing_0002/eval.json b/dataset/object_editing_object_replacing_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..9a5f13c2a3f5e039061d56c1c6d84c5ea3189f67 --- /dev/null +++ b/dataset/object_editing_object_replacing_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the modified image accurately replace the specified object while retaining the rest of the image?", + "0_point_standard": "The replacement object is inaccurately placed or out of context, and other parts of the image show unexpected changes.", + "1_point_standard": "The specified object is accurately replaced, seamlessly integrating into the image, with no unexpected changes to the rest of the image." + }, + { + "question": "Does the replacement object remain consistent with the overall content and style of the reference image?", + "0_point_standard": "The replacement object visibly clashes with the content or style of the reference image, creating a sense of disharmony.", + "1_point_standard": "The replacement object is consistent with the content and style of the reference image, blending naturally and smoothly." + }, + { + "question": "Does the replacement object match the lighting and shadows of surrounding elements, ensuring a natural integration?", + "0_point_standard": "The lighting or shadows on the replacement object are inconsistent with surrounding elements, making it appear unnatural or out of place.", + "1_point_standard": "The lighting and shadows on the replacement object are consistent with surrounding elements, allowing it to naturally integrate into the scene." + }, + { + "question": "Is the scale of the replacement object accurate relative to other elements in the image?", + "0_point_standard": "The scale of the replacement object is noticeably off, making it appear too large or too small compared to surrounding elements.", + "1_point_standard": "The scale of the replacement object is accurate and consistent with other elements in the image, ensuring a balanced appearance." + }, + { + "question": "Do the texture and color of the replacement object match the surrounding environment, contributing to a cohesive appearance?", + "0_point_standard": "The texture or color of the replacement object conflicts with the surrounding environment, creating a jarring contrast.", + "1_point_standard": "The texture and color of the replacement object match the surrounding environment well, enhancing the cohesiveness of the image." + }, + { + "question": "Does the modified image exhibit a professional level of visual appeal and quality, with attention to detail and aesthetics?", + "0_point_standard": "The modified image lacks professional visual quality, with noticeable flaws or a lack of attention to detail.", + "1_point_standard": "The modified image exhibits high visual appeal and quality, with meticulous attention to detail and aesthetics." + } + ] +} \ No newline at end of file diff --git a/dataset/object_editing_object_replacing_0002/images.txt b/dataset/object_editing_object_replacing_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..7f38ebb0d2aea14468a69b6e51785705f84a76c3 --- /dev/null +++ b/dataset/object_editing_object_replacing_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01dy3xpD1mi6CgofIgM_!!6000000004987-0-tps-6000-4000.jpg diff --git a/dataset/object_editing_object_replacing_0002/instruction.txt b/dataset/object_editing_object_replacing_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..34a32edd4d267ec4b64ba3d34e8894e1d0682756 --- /dev/null +++ b/dataset/object_editing_object_replacing_0002/instruction.txt @@ -0,0 +1 @@ +Please replace the Merlion statue in Singapore with the Statue of Liberty. The goal is to keep the background cityscape, fountain water flow, and lighting effects unchanged, but replace the Merlion's shape and features with the Statue of Liberty, ensuring that the proportions and posture integrate seamlessly with the surrounding environment. The generated image should depict the Statue of Liberty as part of the scene. \ No newline at end of file diff --git a/dataset/object_editing_object_replacing_0002/meta.json b/dataset/object_editing_object_replacing_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..77c41913a6849e21870e7f9216043803b1b37813 --- /dev/null +++ b/dataset/object_editing_object_replacing_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "object replacing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0058", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/packaging_rendering_0002/eval.json b/dataset/packaging_rendering_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b1a787cd623915ec98aa0d762e777e1aa8ca7728 --- /dev/null +++ b/dataset/packaging_rendering_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated packaging render accurately retain the basic structure and shape of the product as shown in the original product image?", + "0_point_standard": "There are noticeable deviations or distortions in the product structure and shape in the packaging render compared to the original product image.", + "1_point_standard": "The packaging render accurately retains the product structure and shape as shown in the original product image." + }, + { + "question": "If the task involves partial modifications, does the rest of the image remain unchanged, preserving the original context and details?", + "0_point_standard": "The parts of the image not intended for modification have been altered, resulting in a loss of original context or details.", + "1_point_standard": "The unmodified parts of the image remain unchanged, preserving the original context and details as expected." + }, + { + "question": "Does the rendered packaging accurately reflect the content, style, and branding specified in the text description?", + "0_point_standard": "The packaging render does not accurately reflect the content, style, or branding specified in the text description.", + "1_point_standard": "The packaging render accurately reflects the content, style, and branding specified in the text description." + }, + { + "question": "Are text-based specific instructions from the description (e.g., color changes or branding elements) correctly implemented in the packaging render?", + "0_point_standard": "The packaging render fails to correctly implement text-based specific instructions such as color changes or branding elements.", + "1_point_standard": "The packaging render correctly implements all text-based specific instructions such as color changes or branding elements." + }, + { + "question": "Is the text editing in the packaging render of high quality, with text elements clear, legible, and properly positioned?", + "0_point_standard": "The text in the packaging render is unclear, illegible, or improperly positioned, affecting the overall presentation.", + "1_point_standard": "The text in the packaging render is clear, legible, and properly positioned, enhancing the overall presentation." + }, + { + "question": "Does the packaging render exhibit a high level of professionalism and aesthetics, meeting industry visual quality and design standards?", + "0_point_standard": "The packaging render lacks professionalism and aesthetics, not meeting industry visual quality and design standards.", + "1_point_standard": "The packaging render exhibits a high level of professionalism and aesthetics, meeting or exceeding industry visual quality and design standards." + } + ] +} \ No newline at end of file diff --git a/dataset/packaging_rendering_0002/images.txt b/dataset/packaging_rendering_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..64dbc2cd39bfbf05afa20d5af7ad73d31b787769 --- /dev/null +++ b/dataset/packaging_rendering_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01GrjDzK1HlzTqyy7qF_!!6000000000799-0-tps-1800-1800.jpg diff --git a/dataset/packaging_rendering_0002/instruction.txt b/dataset/packaging_rendering_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..ef4fdf5e0d17da4939dd0bede0ae20f37e863841 --- /dev/null +++ b/dataset/packaging_rendering_0002/instruction.txt @@ -0,0 +1 @@ +Redesign the appearance of the square tissue boxes to reflect a clean and natural aesthetic. The body of the boxes should be a soft off-white color, adorned with light green leaf patterns, conveying a sense of nature and eco-friendliness. The surface of the boxes should be smooth, giving off a fresh and tidy texture. Each tissue box should retain a structured square shape with subtly rounded edges for a softer visual appeal. The top of each box should feature a tissue slot for easy extraction. The overall design should remain simple and harmonious, embodying a modern and fresh style with a light, natural touch. \ No newline at end of file diff --git a/dataset/packaging_rendering_0002/meta.json b/dataset/packaging_rendering_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a3291ad88dda06e86b62bd2329a59fcd0cac11b3 --- /dev/null +++ b/dataset/packaging_rendering_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "package rendering", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0066", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/painting_generation_0002/eval.json b/dataset/painting_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..dc9bb5fe7b27fb5c33ed7c68c416b014718e6662 --- /dev/null +++ b/dataset/painting_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the generated image clear like a painting, with recognizable brushstrokes, textures, or artistic qualities?", + "0_point_standard": "The image lacks recognizable painting qualities, making it unlike an artwork.", + "1_point_standard": "The image has clear painting qualities, with textures and brushstrokes similar to a hand-painted piece." + }, + { + "question": "Is the image visually complete, with balanced composition, and can it be considered a complete artwork without needing additional elements?", + "0_point_standard": "The image appears incomplete or lacks balanced composition, giving an impression of an unfinished piece.", + "1_point_standard": "The image is visually complete and balanced, making it well-suited as a standalone artwork." + }, + { + "question": "Does the painting accurately represent the specific theme, style, or elements described in the text prompt (e.g., landscape, portrait, or surreal themes)?", + "0_point_standard": "The painting does not match the described theme, style, or elements, deviating from the text requirements.", + "1_point_standard": "The painting accurately represents the theme, style, and elements specified in the text prompt." + }, + { + "question": "Is the artistic style consistently applied throughout the painting, maintaining the expected style described (e.g., realism, impressionism, abstract)?", + "0_point_standard": "The style appears inconsistent or mixed, lacking coherence with the expected artistic approach.", + "1_point_standard": "The artistic style is consistently applied, reflecting the expected approach described in the prompt." + }, + { + "question": "Are the details and textures in the painting, such as fine brushwork or layering, realistically rendered to add depth and dimension?", + "0_point_standard": "The details and textures lack clarity or depth, making the image appear flat or artificial.", + "1_point_standard": "The details and textures are rendered with depth and clarity, adding dimension and enhancing the image's realism." + }, + { + "question": "Does the painting exhibit a high level of aesthetic quality, with balanced colors, pleasing composition, and professional artistic finish?", + "0_point_standard": "The painting lacks aesthetic appeal, with poor color balance, weak composition, or an unfinished appearance.", + "1_point_standard": "The painting possesses strong aesthetic appeal, with harmonious colors, balanced composition, and a polished professional appearance." + } + ] +} \ No newline at end of file diff --git a/dataset/painting_generation_0002/images.txt b/dataset/painting_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/painting_generation_0002/instruction.txt b/dataset/painting_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..9230f54fc9c0a65fb3bff1f3b60cd60bbd2d0e33 --- /dev/null +++ b/dataset/painting_generation_0002/instruction.txt @@ -0,0 +1 @@ +This painting depicts a young girl with a troubled and sad expression. She is wearing a light-colored headscarf, with a few strands of hair peeking out from underneath, slightly disheveled. Her large, expressive eyes look forward, conveying a deep sense of sorrow or helplessness. Her delicate face is gently cradled by a rough, wrinkled hand, likely belonging to an older adult. The positioning of the hand appears both protective and controlling, with the arm pressed against the girl's cheek, evoking a strong emotional connection. The background is blurred and rendered in muted gray and brown tones, emphasizing the details of the girl and the hand in the foreground. This directs the viewer's attention to the girl's emotions and the tactile nature of the hand. The entire composition is filled with emotional tension, suggesting a complex mixture of loneliness and protection. \ No newline at end of file diff --git a/dataset/painting_generation_0002/meta.json b/dataset/painting_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..ea5daad05b3ae1cda6900039c75e18f8eb624834 --- /dev/null +++ b/dataset/painting_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "painting generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0030", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/painting_generation_0003/eval.json b/dataset/painting_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..dff950f4c58d4bd86b3fd770359b4ca7725dacb0 --- /dev/null +++ b/dataset/painting_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the generated image as clear as a painting, with recognizable brushstrokes, textures, or artistic qualities?", + "0_point_standard": "The image lacks recognizable painting qualities, making it appear unlike an artwork.", + "1_point_standard": "The image has clear painting qualities, with textures and brushstrokes similar to a hand-painted piece." + }, + { + "question": "Is the image visually complete, with balanced composition, and can it be considered a complete artwork without the need for additional elements?", + "0_point_standard": "The image appears incomplete or lacks a balanced composition, giving an impression of an unfinished piece.", + "1_point_standard": "The image is visually complete and balanced, making it well-suited as a standalone artwork." + }, + { + "question": "Does the painting accurately depict the specific theme, style, or elements described in the text prompt (e.g., landscape, portrait, or surreal theme)?", + "0_point_standard": "The painting does not match the described theme, style, or elements, deviating from the text requirements.", + "1_point_standard": "The painting accurately depicts the specified theme, style, and elements from the text prompt." + }, + { + "question": "Is the artistic style consistently applied throughout the painting, maintaining the described expected style (e.g., realism, impressionism, abstract)?", + "0_point_standard": "The style appears inconsistent or mixed, lacking cohesion with the expected artistic approach.", + "1_point_standard": "The artistic style is consistently applied, reflecting the expected approach described in the prompt." + }, + { + "question": "Are details and textures, such as fine brushstrokes or layering, realistically rendered to add depth and dimension?", + "0_point_standard": "Details and textures lack clarity or depth, making the image appear flat or artificial.", + "1_point_standard": "Details and textures are rendered with depth and clarity, adding dimension and enhancing the image's realism." + }, + { + "question": "Does the painting exhibit a high level of aesthetic quality, with balanced colors, pleasing composition, and professional artistic finish?", + "0_point_standard": "The painting lacks aesthetic appeal, with poor color balance, weak composition, or an unfinished appearance.", + "1_point_standard": "The painting has strong aesthetic appeal, with harmonious colors, balanced composition, and a polished professional appearance." + } + ] +} \ No newline at end of file diff --git a/dataset/painting_generation_0003/images.txt b/dataset/painting_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/painting_generation_0003/instruction.txt b/dataset/painting_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..8889b6c5e43cb71ddc378ea8ab134f48841bc141 --- /dev/null +++ b/dataset/painting_generation_0003/instruction.txt @@ -0,0 +1 @@ +This painting portrays a dreamlike and mysterious scene. On the right side, there is a multi-tiered stone pagoda, ancient and intricately detailed, showing the marks of time. The tower's structure features delicate carvings and decorative pillars, embodying an old architectural style. The top of the pagoda is adorned with sharp finials, and small bells hang from the eaves. The stone pagoda is surrounded by dense, lush trees with expansive, intricate canopies. On the left, an enormous, ancient tree spreads its twisting branches and roots. Beneath the tree, a glowing white horse appears to be walking slowly, with small glimmers of light surrounding it, adding a sense of mystique to the scene. In the distance, another similar stone pagoda is faintly visible, while several birds are flying near its top. The sky is filled with a light mist, creating a tranquil and surreal atmosphere. The overall color scheme is soft, dominated by shades of blue and green, which gives the scene a peaceful and transcendent feeling. \ No newline at end of file diff --git a/dataset/painting_generation_0003/meta.json b/dataset/painting_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b8ff6d6181725399dcaad1681d0ac283f8392f8f --- /dev/null +++ b/dataset/painting_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "painting generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0030", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_finished_work_0002/eval.json b/dataset/paintings_undo_painting_undo_from_finished_work_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..be1e7984d9b843102d9d0ab9371510605829fe7c --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_finished_work_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Correlation between input and output images: Does each intermediate process rendering clearly derive its content and style from the completed painting image?", + "0_point_standard": "Intermediate images have no apparent relation to the completed painting, with elements or style mismatching the original image.", + "1_point_standard": "Each intermediate image is closely related to the completed painting, maintaining consistent content and style derived from the original image." + }, + { + "question": "Preservation of key features: Are the main structural elements and key features of the painting retained in all intermediate images?", + "0_point_standard": "Key features or structural elements vary significantly, leading to a lack of continuity in the representation of the painting process.", + "1_point_standard": "Main structural elements and key features are retained in all intermediate images, ensuring continuity and consistency throughout the process." + }, + { + "question": "Temporal logic: Do the intermediate process images follow a logical progression, showing a plausible development sequence from the initial painting to the final painting?", + "0_point_standard": "The image sequence lacks logical progression, with steps appearing out of order or inconsistent with the natural painting process.", + "1_point_standard": "Intermediate images progress logically, accurately reflecting a step-by-step development consistent with painting practices." + }, + { + "question": "Consistency of image style: Is the style of all intermediate images consistent, maintaining the same artistic approach as the completed painting?", + "0_point_standard": "There are significant stylistic differences between intermediate images, lacking cohesion and differing from the style of the completed painting.", + "1_point_standard": "The style of all images is consistent, reflecting a unified artistic approach similar to the completed painting." + }, + { + "question": "Aesthetic quality and detail: Do the intermediate images maintain a high level of detail and aesthetic quality, reflecting the professional completion of the original painting?", + "0_point_standard": "Intermediate images are poorly detailed or lack aesthetic appeal, leading to a decrease in image quality.", + "1_point_standard": "Each intermediate image is rich in detail and visually appealing, maintaining the high-quality completion of the original painting." + }, + { + "question": "Authenticity of process details: Do the intermediate images show a realistic progression of details, with textures, shading, and other artistic elements naturally accumulating over time?", + "0_point_standard": "The progression of details is unrealistic, with sudden changes in textures, shading, or details that do not reflect the natural painting process.", + "1_point_standard": "Intermediate images show a realistic accumulation of textures, shading, and other details, accurately reflecting the natural progression toward the final painting." + } + ] +} \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_finished_work_0002/images.txt b/dataset/paintings_undo_painting_undo_from_finished_work_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..ed0fffacc9e5e8736efb7b3f5bc6b656911f44ef --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_finished_work_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01Bbpgpp252OqIQtGOq_!!6000000007468-0-tps-3000-2345.jpg diff --git a/dataset/paintings_undo_painting_undo_from_finished_work_0002/instruction.txt b/dataset/paintings_undo_painting_undo_from_finished_work_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f71ecf925e058e18b9a67b68e6e25e205cda8e2f --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_finished_work_0002/instruction.txt @@ -0,0 +1 @@ +Please generate 4 images representing the intermediate stages of this sunrise seascape painting, with each image depicting a different phase of the painting process. The first image should show the early stage where the outlines of the boat, figure, and sea are drawn, but there are no colors or details yet. The second image should depict the mid-stage, where the basic colors are applied to the sky and sea, but the shading and lighting effects have not yet emerged, giving the scene a flat appearance. The third image should show the stage where more details, such as the reflection of the sunrise on the water and the layers of color in the sky and clouds, begin to develop, adding depth and richness. The fourth image should represent the nearly completed stage, where all the major elements, including colors and light, are fully depicted, and the details of the sea and sky are nearly refined, though not at the level of the final polished version. \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_finished_work_0002/meta.json b/dataset/paintings_undo_painting_undo_from_finished_work_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1aaace8a53933aa6073d4b61a58aacdc23cbca2b --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_finished_work_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "drawing process generation given finished painting", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0036", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/eval.json b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..ac730fcd0f892274f4d9efad295289cb73449e63 --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does each intermediate image clearly originate from the initial unfinished painting, retaining its fundamental elements and layout?", + "0_point_standard": "The intermediate image lacks a clear connection to the initial unfinished painting, altering the fundamental structure.", + "1_point_standard": "Each intermediate image is closely related to the initial unfinished painting, retaining the fundamental elements and layout." + }, + { + "question": "Does the sequence of images reflect a logical progression, showing a step-by-step development towards a completed painting?", + "0_point_standard": "The sequence appears disjointed, with steps not following a natural completion process.", + "1_point_standard": "The images follow a logical, step-by-step progression, consistent with natural painting practices." + }, + { + "question": "Does the sequence end with a fully completed painting, exhibiting a refined and finished appearance?", + "0_point_standard": "The final image appears unfinished, lacking the refinement or detail expected in a finished painting.", + "1_point_standard": "The final image is clearly completed, with a refined and complete appearance consistent with the expected final effect." + }, + { + "question": "Do the intermediate images realistically build texture, shading, and other details, showing natural artistic progression?", + "0_point_standard": "Details, texture, or shading develop in an unrealistic or inconsistent manner, lacking a natural process.", + "1_point_standard": "The images show a realistic progression of texture, shading, and details, naturally developing towards a completed painting." + }, + { + "question": "Is the artistic style consistent across all intermediate images, reflecting the same creative approach as the final painting?", + "0_point_standard": "The style varies greatly between stages, disrupting the overall artistic coherence.", + "1_point_standard": "The style is consistent across all images, aligning with the creative approach of the final painting." + }, + { + "question": "Do the set of intermediate images exhibit high aesthetic quality, contributing to a coherent and professionally appearing progression?", + "0_point_standard": "The images lack aesthetic coherence or professionalism, with low-quality or inconsistent elements detracting from the overall progression.", + "1_point_standard": "The images exhibit aesthetic coherence and high quality, presenting a professionally appearing progression towards the final painting." + } + ] +} \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/images.txt b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..509475f521f7f74fd98eb215a478a94654bffbc2 --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i3/O1CN01iJCGca1r8RRfwknro_!!6000000005586-0-tps-2592-1856.jpg diff --git a/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/instruction.txt b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0e33d1540bc7aee7c7fa025e66caae7d8b54e4a9 --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/instruction.txt @@ -0,0 +1 @@ +Please generate 4 images representing the painting process from the unfinished sketch to the final product. The first image should show the phase where basic color blocks are applied to all main objects such as buildings, trees, and the river, but without any details. The second image should depict further refinement of colors, with shadows starting to appear on buildings and trees, and colors becoming more saturated. The third image should show the addition of details, such as windows on buildings, layers of tree leaves, and ripples on the river. The fourth image will be the final product, where all details and colors are completed. The final product should maintain a clean illustration style, with clear lines and saturated but simple colors, and a smooth, polished, and neat illustrative effect. \ No newline at end of file diff --git a/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/meta.json b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..c42d7a8c1f8a248afab0f21c50567ba4d06e756b --- /dev/null +++ b/dataset/paintings_undo_painting_undo_from_semi-finished_work_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "drawing process generation given semi-finished reference", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0037", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/panorama_generation_0001/eval.json b/dataset/panorama_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..827f15cd28ef6e274b1fb2397d7ce48b53bfaf8d --- /dev/null +++ b/dataset/panorama_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the panorama clearly originate from the input images, effectively combining the content of each image into a single, continuous view?", + "0_point_standard": "The panorama fails to clearly present the combination of input images, lacking continuity or missing elements from the original images.", + "1_point_standard": "The panorama clearly originates from the input images, integrating the content of each into a coherent, continuous scene." + }, + { + "question": "Are the edges and overlaps between the images seamlessly stitched, without noticeable misalignment or breaks?", + "0_point_standard": "The panorama shows noticeable misalignment or breaks at the edges or overlaps, disrupting the smoothness of the scene.", + "1_point_standard": "The panorama smoothly stitches the edges and overlaps, with no noticeable gaps or breaks, ensuring scene coherence." + }, + { + "question": "Does the panorama maintain consistency in content and style throughout the image, accurately reflecting the original input images?", + "0_point_standard": "The panorama exhibits inconsistencies in content or style, introducing elements or styles not present in the input images.", + "1_point_standard": "The panorama maintains consistency in content and style throughout the image, accurately reflecting the key elements and appearance of the input images." + }, + { + "question": "Does the panorama meet any specific requirements from the text description, such as orientation, specific focus areas, or designated elements?", + "0_point_standard": "The panorama does not meet the specific requirements mentioned in the text description, lacking key orientation or focus elements.", + "1_point_standard": "The panorama accurately meets all specific requirements from the text description, incorporating orientation, focus, or designated elements as specified." + }, + { + "question": "Do the color, lighting, and contrast naturally blend together in the stitched images, creating a unified appearance across the panorama?", + "0_point_standard": "The panorama shows noticeable differences in color, lighting, or contrast between stitched parts, resulting in a disjointed appearance.", + "1_point_standard": "The panorama achieves a natural and consistent blend of color, lighting, and contrast across all parts, creating a unified cohesive appearance." + }, + { + "question": "Does the panorama exhibit high-quality rendering effects, with attention to detail, clarity, and aesthetically pleasing composition?", + "0_point_standard": "The panorama lacks high-quality rendering effects, with issues such as blurriness, low resolution, or unbalanced composition, affecting visual appeal.", + "1_point_standard": "The panorama exhibits high-quality rendering effects, with clear details, balanced composition, and an aesthetically pleasing, professional appearance." + } + ] +} \ No newline at end of file diff --git a/dataset/panorama_generation_0001/images.txt b/dataset/panorama_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..3d259f638a4b75677b7f4d7bc4b7f714dea7a487 --- /dev/null +++ b/dataset/panorama_generation_0001/images.txt @@ -0,0 +1,3 @@ +https://img.alicdn.com/imgextra/i2/O1CN01IohYfa24fxQy0XUyh_!!6000000007419-0-tps-640-452.jpg +https://img.alicdn.com/imgextra/i2/O1CN01PXNnHF244rLvNoc6y_!!6000000007338-0-tps-640-452.jpg +https://img.alicdn.com/imgextra/i1/O1CN01vg6BAE21oi9tjcM5n_!!6000000007032-0-tps-640-452.jpg diff --git a/dataset/panorama_generation_0001/instruction.txt b/dataset/panorama_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..272b24db0b7c6d4bfe1423541e5c54c60b61630d --- /dev/null +++ b/dataset/panorama_generation_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a wide panorama image based on multiple input images, ensuring that the model performs seamless stitching of the images with no visible transition lines between them. The overlapping areas should be handled appropriately to avoid any duplication or distortion, and the perspective across the images should remain consistent. The final panorama should be smooth and cohesive, maintaining continuity in the overall content and scene, resulting in a high-quality, wide panorama that captures the full scene. \ No newline at end of file diff --git a/dataset/panorama_generation_0001/meta.json b/dataset/panorama_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..84567d2916faa17943a12674f98d22647ebbc9dd --- /dev/null +++ b/dataset/panorama_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "panorama generation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0054", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0002/eval.json b/dataset/physical_laws_illustration_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..06ecc7af7e09455cfbb80e44356c601fe76c1e6c --- /dev/null +++ b/dataset/physical_laws_illustration_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Time Logic: Does the sequence of images logically present the changes in physical laws over time?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks logical flow, failing to illustrate the process of gradual change.", + "1_point_standard": "The sequence of images clearly presents the changes in physical laws in a logical chronological order." + }, + { + "question": "Consistency with Text Description: Does the image content match the physical laws specified in the text description?", + "0_point_standard": "The image content does not accurately reflect the physical laws described in the text, showing clear discrepancies.", + "1_point_standard": "The image content completely matches the text description, accurately demonstrating the specified physical laws." + }, + { + "question": "Consistency of Image Style: Is the style and overall visual effect of the images consistent?", + "0_point_standard": "The image style is inconsistent, resulting in a disjointed visual effect.", + "1_point_standard": "All images maintain a consistent style, creating a coherent visual effect." + }, + { + "question": "Consistency of Object/Character ID: Does the generated sequence of images maintain consistency in the same object or character ID (e.g., the same object or character)?", + "0_point_standard": "The main subjects are inconsistent between frames, making it difficult to recognize them as the same object or character.", + "1_point_standard": "The main subjects remain consistent and can be clearly identified as the same object or character." + }, + { + "question": "Logical Accuracy: Is the demonstration of physical laws reasonable and logically sound?", + "0_point_standard": "The representation of physical laws is illogical or unreasonable, with obvious errors or unrealistic descriptions.", + "1_point_standard": "The representation of physical laws is reasonable, logical, and accurately reflects the expected physical principles." + }, + { + "question": "Professional Aesthetics: Do the details and aesthetics of the images meet professional standards and possess visual appeal?", + "0_point_standard": "The images lack detail and aesthetic appeal, failing to meet visual standards.", + "1_point_standard": "The images are rich in detail and have excellent aesthetics, meeting professional standards and possessing visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0002/images.txt b/dataset/physical_laws_illustration_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/physical_laws_illustration_0002/instruction.txt b/dataset/physical_laws_illustration_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..89112152c38d782cbfc56a48094bcdbfcfe3c938 --- /dev/null +++ b/dataset/physical_laws_illustration_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a scene of a ball rolling off a table, containing 4 images arranged in chronological order, showing the ball rolling from the edge of the table and landing on the ground. All images must follow the physical laws of inertia and gravity. \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0002/meta.json b/dataset/physical_laws_illustration_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..01d3edb592c19a0223fc711ad646f60da84e7363 --- /dev/null +++ b/dataset/physical_laws_illustration_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "physical laws illustration", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0018", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0003/eval.json b/dataset/physical_laws_illustration_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4b622a420ed45fe88da8ee7ffe8b9c9e9382be10 --- /dev/null +++ b/dataset/physical_laws_illustration_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Time Logic: Does the sequence of images logically present the change of physical laws in chronological order?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks a logical flow, failing to illustrate the process of gradual change.", + "1_point_standard": "The sequence of images clearly presents the change of physical laws in a logical chronological order." + }, + { + "question": "Consistency with Text Description: Does the content of the images match the physical laws specified in the text description?", + "0_point_standard": "The content of the images does not accurately reflect the physical laws in the text description and shows significant deviations.", + "1_point_standard": "The content of the images perfectly matches the text description, accurately demonstrating the specified physical laws." + }, + { + "question": "Consistency in Image Style: Is the style and overall visual effect of the images consistent?", + "0_point_standard": "The image style is inconsistent, resulting in a disjointed visual effect.", + "1_point_standard": "All images maintain a consistent style, creating a coherent visual effect." + }, + { + "question": "Consistency of Object/Role ID: Does the generated image sequence maintain consistency of the same object or role ID (e.g., the same object or character)?", + "0_point_standard": "The main subject is inconsistent between different frames, making it difficult to identify as the same object or character.", + "1_point_standard": "The main subject remains consistent and can be clearly identified as the same object or character." + }, + { + "question": "Logical Accuracy: Is the demonstration of physical laws reasonable and logically sound?", + "0_point_standard": "The representation of physical laws is illogical or unreasonable, with obvious errors or unrealistic descriptions.", + "1_point_standard": "The representation of physical laws is reasonable, logical, and accurately reflects the expected physical principles." + }, + { + "question": "Professional Aesthetics: Do the details and aesthetics of the images meet professional standards and are visually appealing?", + "0_point_standard": "The images lack detail, have poor aesthetic quality, and do not meet visual standards.", + "1_point_standard": "The images are rich in detail, have excellent aesthetic quality, meet professional standards, and are visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0003/images.txt b/dataset/physical_laws_illustration_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/physical_laws_illustration_0003/instruction.txt b/dataset/physical_laws_illustration_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..066563541e6348ee046034f2f60a261eb7c2845e --- /dev/null +++ b/dataset/physical_laws_illustration_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a scene of a person skipping a stone across water, containing 4 images arranged in chronological order, showing the stone bouncing across the water surface. All images must follow the physical laws of surface tension and reaction forces. \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0003/meta.json b/dataset/physical_laws_illustration_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..131ff3552fb0b77119c0cfc29fdef609ff00a0f3 --- /dev/null +++ b/dataset/physical_laws_illustration_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "physical laws illustration", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0018", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0004/eval.json b/dataset/physical_laws_illustration_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..d62312ffb4dfe41225f6916cdded98406fa6fc79 --- /dev/null +++ b/dataset/physical_laws_illustration_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Time Logic: Does the sequence of images logically present the changes in physical laws in chronological order?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks a logical flow, failing to illustrate the process of gradual change.", + "1_point_standard": "The sequence of images clearly presents the changes in physical laws in logical chronological order." + }, + { + "question": "Consistency with Text Description: Does the image content match the physical laws specified in the text description?", + "0_point_standard": "The image content does not accurately reflect the physical laws described in the text, with noticeable deviations.", + "1_point_standard": "The image content perfectly matches the text description, accurately demonstrating the specified physical laws." + }, + { + "question": "Consistency of Image Style: Is the style and overall visual effect of the images consistent?", + "0_point_standard": "The image style is inconsistent, leading to a disjointed visual effect.", + "1_point_standard": "All images maintain a consistent style, creating a coherent visual effect." + }, + { + "question": "Consistency of Object/Character ID: Does the generated image sequence maintain consistency of the same object or character ID (e.g., the same object or character)?", + "0_point_standard": "The main subject is inconsistent between different frames, making it difficult to recognize as the same object or character.", + "1_point_standard": "The main subject is consistent and can be clearly identified as the same object or character." + }, + { + "question": "Logical Accuracy: Is the demonstration of the physical laws reasonable and logically sound?", + "0_point_standard": "The representation of the physical laws is illogical or unreasonable, with obvious errors or unrealistic descriptions.", + "1_point_standard": "The representation of the physical laws is reasonable, logical, and accurately reflects the expected physical principles." + }, + { + "question": "Professional Aesthetics: Do the details and aesthetics of the images meet professional standards and are visually appealing?", + "0_point_standard": "The images lack detail, have poor aesthetics, and do not meet visual standards.", + "1_point_standard": "The images are rich in detail, have excellent aesthetics, meet professional standards, and are visually appealing." + } + ] +} \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0004/images.txt b/dataset/physical_laws_illustration_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/physical_laws_illustration_0004/instruction.txt b/dataset/physical_laws_illustration_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3e9debb0f4bf2f5d572f8868a26d0cbf850f4ea1 --- /dev/null +++ b/dataset/physical_laws_illustration_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a scene of a car accelerating down a slope, containing 4 images arranged in chronological order, showing the car sliding from the top to the bottom of the slope. All images must follow the physical laws of gravity and friction. \ No newline at end of file diff --git a/dataset/physical_laws_illustration_0004/meta.json b/dataset/physical_laws_illustration_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1e10ab6166afb2a895d8260437c345bcb6ef93dd --- /dev/null +++ b/dataset/physical_laws_illustration_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "physical laws illustration", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0018", + "output_image_count": 4, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_0001/eval.json b/dataset/plant_growth_process_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..75b15f4a15257176acfbb771a29a3319e90d9669 --- /dev/null +++ b/dataset/plant_growth_process_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sequence of images logically present the stages of plant growth in chronological order?", + "0_point_standard": "The sequence of images is not arranged in chronological order or lacks logical progression, failing to illustrate the stages of plant growth.", + "1_point_standard": "The sequence of images clearly presents the stages of plant growth in logical chronological order." + }, + { + "question": "Does the image content accurately reflect the plant growth process specified in the text description?", + "0_point_standard": "The image content inaccurately represents the plant growth stages described in the text, showing obvious discrepancies.", + "1_point_standard": "The image content perfectly matches the text description, accurately depicting the specified plant growth stages." + }, + { + "question": "Is the style and overall visual effect of the images consistent throughout the sequence?", + "0_point_standard": "The image style is inconsistent, leading to a visual disconnect that disrupts the sequence's coherence.", + "1_point_standard": "All images maintain a consistent style, creating a cohesive visual effect throughout the growth sequence." + }, + { + "question": "Does the generated image sequence maintain consistency in representing the same plant species or individual plant?", + "0_point_standard": "The plant looks inconsistent between different frames, making it difficult to identify as the same species or individual plant.", + "1_point_standard": "The plant is consistent and can be clearly identified as the same species or individual plant throughout the sequence." + }, + { + "question": "Is the demonstration of the plant growth process reasonable and logical, considering biological principles?", + "0_point_standard": "The depiction of plant growth is illogical or unreasonable, with descriptions of growth stages being clearly inaccurate or unrealistic.", + "1_point_standard": "The demonstration of the plant growth process is reasonable, logical, and accurately reflects expected biological growth principles." + }, + { + "question": "Do the details and aesthetics of the images meet professional standards and are they visually appealing?", + "0_point_standard": "The images lack detail and aesthetics, falling short of visual standards and detracting from the overall presentation.", + "1_point_standard": "The images are rich in detail and have excellent aesthetics, meeting professional standards and being visually appealing, enhancing the overall presentation." + } + ] +} \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_0001/images.txt b/dataset/plant_growth_process_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/plant_growth_process_generation_0001/instruction.txt b/dataset/plant_growth_process_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..80fb3e49109c0e5d0ccaa5a64fe0f664fbdefbbe --- /dev/null +++ b/dataset/plant_growth_process_generation_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a set of images depicting the growth of a real-world oak tree from seed to maturity. The first image shows an acorn buried in soft soil, surrounded by a few fallen leaves; the second image shows a young oak seedling just breaking through the soil, bathed in sunlight on the forest floor; the third image shows a half-grown oak tree, its leaves gently swaying in the breeze, with low shrubs and grass surrounding it; the fourth image shows a fully mature oak tree with a dense canopy, sunlight filtering through the branches, and a vast forest in the background. \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_0001/meta.json b/dataset/plant_growth_process_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..0d9c2d7c0414ceb40ad3b77801d43a4e0a028b7d --- /dev/null +++ b/dataset/plant_growth_process_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "plant growth process generation without reference", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0020", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_with_reference_0002/auto_eval.jsonl b/dataset/plant_growth_process_generation_with_reference_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..f44b2e8c4637065b543b0d8f4933ca00e4950f53 --- /dev/null +++ b/dataset/plant_growth_process_generation_with_reference_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg"], "question": "Is the number in the image the digit 4? 0 points: The number in the image is not the digit 4; 1 point: The number in the image is the digit 4. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first output image and original input image of the response provided by a student. The task objective is to generate a set of plant growth stage images.\nThe text requirement is:\nPlease generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process.\nYour review question is:\nDoes the seedling in the first output image appear as an early growth stage of the same sunflower in the original input image, maintaining recognizable characteristics such as leaf shape and stem texture? 0 points: The seedling does not resemble the sunflower in the original image, making it difficult to identify as the same plant. 1 point: The seedling retains recognizable features of the original sunflower, indicating an early growth stage of the same plant.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth output images of the response provided by a student. The task objective is to generate a set of plant growth stage images.\nThe text requirement is:\nPlease generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process.\nYour review question is:\nDo the second and fourth output images show a logical progression in the growth of the sunflower, with an increase in height, foliage, and flower development? 0 points: The growth progression appears unrealistic or inconsistent, with abrupt or illogical changes in the sunflower’s structure. 1 point: The growth progression is logical, showing a natural increase in height, leaves, and flower development as it nears blooming.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fourth output images of the response provided by a student. The task objective is to generate a set of plant growth stage images.\nThe text requirement is:\nPlease generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process.\nYour review question is:\nDo the third and fourth output images maintain a consistent visual style, including lighting, shading, and rendering quality? 0 points: The style differs noticeably between the images, reducing the cohesion of the series. 1 point: The style is consistent across both images, with matching lighting, shading, and rendering quality that enhance continuity.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the original input image and fourth output image of the response provided by a student. The task objective is to generate a set of plant growth stage images.\nThe text requirement is:\nPlease generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process.\nYour review question is:\nDoes the background in the fourth output image remain consistent with the original input image, including the sky, lighting, and general surroundings? 0 points: The background differs noticeably, making the scene appear unrelated or inconsistent with the original setting. 1 point: The background is consistent, showing the same environment as the original, maintaining continuity in the surroundings.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and third output images of the response provided by a student. The task objective is to generate a set of plant growth stage images.\nThe text requirement is:\nPlease generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process.\nYour review question is:\nDo the first and third output images accurately reflect the described growth stages, with the first showing a young seedling with few leaves and the third showing a maturing plant with a forming flower bud? 0 points: The images do not clearly match the described growth stages, with unrealistic or inaccurate representations of the sunflower’s development. 1 point: The images align well with the descriptions, showing a logical increase in size and complexity as the sunflower grows.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/plant_growth_process_generation_with_reference_0002/eval.json b/dataset/plant_growth_process_generation_with_reference_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c30dada289a2e7c0347ae2fb1a9ce13a7eeede23 --- /dev/null +++ b/dataset/plant_growth_process_generation_with_reference_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements of the text description?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Does the seedling in the first output image represent the early growth stage of the same sunflower from the original input image, retaining recognizable features such as leaf shape and stem texture?", + "0_point_standard": "The seedling does not resemble the sunflower in the original image and is difficult to identify as the same plant.", + "1_point_standard": "The seedling retains recognizable features of the original sunflower, showing the early growth stage of the same plant." + }, + { + "question": "Do the second and fourth output images show a reasonable progression in the growth of the sunflower, exhibiting increased height, number of leaves, and flower development?", + "0_point_standard": "The growth progression appears unrealistic or inconsistent, with abrupt or unreasonable changes in the sunflower's structure.", + "1_point_standard": "The growth progression is reasonable, showing a natural increase in height, number of leaves, and flower development as it approaches blooming." + }, + { + "question": "Do the third and fourth output images maintain a consistent visual style, including lighting, shadows, and rendering quality?", + "0_point_standard": "The styles of the two images are significantly different, reducing the coherence of the series.", + "1_point_standard": "The styles of the two images are consistent, with matching lighting, shadows, and rendering quality, enhancing continuity." + }, + { + "question": "Does the background in the fourth output image remain consistent with the original input image, including the sky, lighting, and overall environment?", + "0_point_standard": "There are significant differences in the background, making the scene appear unrelated or inconsistent with the original environment.", + "1_point_standard": "The background is consistent, showing the same environmental elements as the original, maintaining the continuity of the surroundings." + }, + { + "question": "Do the first and third output images accurately reflect the described growth stages, with the first showing a seedling with fewer leaves and the third showing a mature plant with forming flower buds?", + "0_point_standard": "The images fail to clearly correspond to the described growth stages, with unreasonable or inaccurate growth representation of the sunflower.", + "1_point_standard": "The images closely match the description, showing a reasonable increase in size and complexity of the sunflower during its growth." + } + ] +} \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_with_reference_0002/images.txt b/dataset/plant_growth_process_generation_with_reference_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..58d3a51c0eb5a7ee270990ddc0737ad80cd095b1 --- /dev/null +++ b/dataset/plant_growth_process_generation_with_reference_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01WdBlDi1n0sRSDWpC8_!!6000000005028-0-tps-3000-2000.jpg diff --git a/dataset/plant_growth_process_generation_with_reference_0002/instruction.txt b/dataset/plant_growth_process_generation_with_reference_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..1f21c74c3ce0ec8d40fb0ed036f02651862586f6 --- /dev/null +++ b/dataset/plant_growth_process_generation_with_reference_0002/instruction.txt @@ -0,0 +1 @@ +Please generate 4 images showing different growth stages of this sunflower, illustrating its transformation from a seedling to full bloom. The first image should depict the sunflower just emerging from the ground, with only a few leaves. The second image should show the sunflower growing taller, with more leaves, but the flower has not yet fully developed. The third image should depict the stage where the flower bud is forming, nearing the blooming phase. The fourth image can show the sunflower just before it fully blooms. Each generated image should clearly demonstrate that it's the same sunflower from the original picture, only at different times, reflecting its natural growth process. \ No newline at end of file diff --git a/dataset/plant_growth_process_generation_with_reference_0002/meta.json b/dataset/plant_growth_process_generation_with_reference_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a8315d77799c2b5f9c48eecb591ab0a350f88964 --- /dev/null +++ b/dataset/plant_growth_process_generation_with_reference_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "plant growth process generation with reference", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0046", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/poster_generation_0001/auto_eval.jsonl b/dataset/poster_generation_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..ffc1b3c53984eac8066192f7b6729410adb4c181 --- /dev/null +++ b/dataset/poster_generation_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nDoes the generated image clearly resemble a poster, with recognizable elements such as a focal design, layout structure, and text components? 0 points: The image lacks identifiable poster qualities, making it unclear as a promotional or informational design. 1 point: The image has clear poster characteristics, with a defined layout, focal design, and text elements typical of a poster.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nIs the image visually complete, with a balanced composition that does not require additional elements to be perceived as a finished painting? 0 points: The image appears incomplete or lacks a balanced composition, giving the impression of an unfinished piece. 1 point: The image is visually complete and balanced, functioning well as a standalone painting.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nDoes the painting accurately represent the specific subject, style, or elements described in the text prompt (e.g., a landscape, portrait, or surreal theme)? Read the text requirement sentence by sentence, If any elements in one sentence is not reflected in the poster, it will be considered as 0 points. 0 points: The painting does not align with the described subject, style, or elements, deviating from the text requirements. 1 point: The painting accurately represents the subject, style, and elements specified in the text prompt.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nAre the text elements (e.g., title, tagline, body text) in the poster clear, readable, and appropriately placed to convey the intended message? 0 points: The text elements are unclear, difficult to read, or poorly positioned, affecting the communication of the message. 1 point: The text elements are clear, readable, and well-placed, effectively conveying the intended message.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nDoes the poster utilize visual hierarchy effectively, with emphasis on key elements such as the main message, imagery, or call-to-action? 0 points: The poster lacks a clear visual hierarchy, making it difficult to distinguish important elements from supporting details. 1 point: The poster uses visual hierarchy effectively, with clear emphasis on key elements, making the design easy to follow.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. The work consists of only one image as the response provided by a student. The task objective is to generate a poster based on the text requirements.\nThe text requirement is:\n\"This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern.\"\nYour review question is:\nDoes the poster exhibit a high level of aesthetic quality, with a cohesive design, appealing color choices, and strong visual impact? 0 points: The poster lacks aesthetic appeal, with poor color choices, weak composition, or an unprofessional look. 1 point: The poster has strong aesthetic appeal, with cohesive design elements, attractive colors, and a visually impactful, professional finish.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/poster_generation_0001/eval.json b/dataset/poster_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..20d9e126610a31e93a41da8fa910e345d2605dd0 --- /dev/null +++ b/dataset/poster_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image clearly present as a poster, with recognizable elements such as focal design, layout structure, and text components?", + "0_point_standard": "The image lacks recognizable poster features, making it difficult to identify as promotional or informational design.", + "1_point_standard": "The image has clear poster features, with a definite layout, focal design, and text elements that typically conform to poster style." + }, + { + "question": "Is the image visually complete, with a balanced composition, and doesn't require additional elements to be considered a complete artwork?", + "0_point_standard": "The image appears incomplete or lacks a balanced composition, giving an impression of being unfinished.", + "1_point_standard": "The image is visually complete and balanced, functioning well as a standalone piece of artwork." + }, + { + "question": "Does the artwork accurately represent the specific theme, style, or elements described in the text prompt (e.g., landscape, portrait, or surreal themes)? Read the text requirements sentence by sentence, and if an element from any sentence is not reflected in the poster, score it 0 points.", + "0_point_standard": "The artwork fails to reflect the described theme, style, or elements, deviating from the text requirements.", + "1_point_standard": "The artwork accurately represents the theme, style, and elements specified in the text prompt." + }, + { + "question": "Are the text elements in the poster (such as titles, slogans, body text) clear, legible, and appropriately positioned to convey the intended message?", + "0_point_standard": "The text elements are unclear, hard to read, or poorly positioned, affecting the communication of information.", + "1_point_standard": "The text elements are clear, legible, and well-positioned, effectively conveying the intended message." + }, + { + "question": "Does the poster effectively use visual hierarchy to emphasize major elements such as the main message, image, or call to action?", + "0_point_standard": "The poster lacks a clear visual hierarchy, making it difficult to distinguish important elements from secondary details.", + "1_point_standard": "The poster effectively uses visual hierarchy, clearly emphasizing major elements, making the design easy to understand." + }, + { + "question": "Does the poster exhibit a high level of aesthetic quality, with coherent design, attractive color scheme, and strong visual impact?", + "0_point_standard": "The poster lacks aesthetic appeal, with poor color scheme, weak composition, or appears unprofessional.", + "1_point_standard": "The poster has strong aesthetic appeal, coherent design, attractive color scheme, and strong visual impact, presenting a professional appearance." + } + ] +} \ No newline at end of file diff --git a/dataset/poster_generation_0001/images.txt b/dataset/poster_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/poster_generation_0001/instruction.txt b/dataset/poster_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..cde6deb5297287806dec1c4da90742e7f80bda4b --- /dev/null +++ b/dataset/poster_generation_0001/instruction.txt @@ -0,0 +1 @@ +This poster, themed around the animated movie “Kung Fu Panda,” features a vintage and stylized art style with an earthy color palette of orange, green, and brown tones. At the top, there is a DreamWorks logo with a green and black background. The poster’s main focus is the panda character Po, who takes a central position in a dynamic kung fu pose, his black and white fur and determined expression highlighted. Surrounding Po are his friends and fellow kung fu warriors: Tigress on the left, in an orange and black striped martial arts outfit; Monkey on the right, poised for action with a serious look; Mantis at the bottom, small and green, and Crane above with outstretched wings. To Po’s left and right, circular frames feature Shifu, a small red panda in a green robe, and Viper, a green snake with intricate patterns. Above Po, the villain Tai Lung, a fierce snow leopard with spots, looms large, exuding menace and strength, with sharp eyes and bared fangs. The background contains traditional Chinese architectural elements, with a temple and mountains under a red sky that adds depth and atmosphere to the scene. At the bottom, the title “KUNG FU PANDA” is written in bold, stylized letters with a bamboo pattern. \ No newline at end of file diff --git a/dataset/poster_generation_0001/meta.json b/dataset/poster_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..63f0ea3d93495f9d44003c7ed04cabcd0f51406c --- /dev/null +++ b/dataset/poster_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "poster generation", + "num_of_cases": 5, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0026", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/product_usage_scenario_generation_0002/auto_eval.jsonl b/dataset/product_usage_scenario_generation_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..47944dfc932a1ea269b5d8bb63c78d41ced7cd76 --- /dev/null +++ b/dataset/product_usage_scenario_generation_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg"], "question": "Is the number in the image the digit 4? 0 points: The number in the image is not the digit 4; 1 point: The number in the image is the digit 4. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the original input image and first output image of the response provided by a student. The task objective is to generate different application scenario images of a speicific product.\nThe text requirement is:\nBased on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios.\nYour review question is:\nDo the headphones in the first output image match the design, shape, and color of the original headphone image, ensuring they are the same product? 0 points: The headphones appear different from the original, with noticeable inconsistencies in design or color. 1 point: The headphones in the generated image match the original design, shape, and color, ensuring they’re recognizable as the same product.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fourth output images of the response provided by a student. The task objective is to generate different application scenario images of a speicific product.\nThe text requirement is:\nBased on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios.\nYour review question is:\nDo the third and fourth output images accurately depict the specific scene details provided, such as the outdoor setting with trees and sunlight in the third image, and the modern living room setting with books and a coffee cup in the fourth image? 0 points: The scenes lack the specific elements described, making it hard to identify the intended environment. 1 point: The scenes accurately incorporate the specified elements, making the environments clear and aligned with the description.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth output images of the response provided by a student. The task objective is to generate different application scenario images of a speicific product.\nThe text requirement is:\nBased on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios.\nYour review question is:\nDo the second and fourth output images maintain a consistent realistic photography style, with appropriate lighting, textures, and rendering quality? 0 points: The style between these images differs noticeably, reducing the visual cohesion of the series. 1 point: Both images exhibit a consistent realistic photography style, with coherent lighting, textures, and rendering, enhancing the overall continuity.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first and second output images of the response provided by a student. The task objective is to generate different application scenario images of a speicific product.\nThe text requirement is:\nBased on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios.\nYour review question is:\nDoes the lighting in the first and second output images align with the description, such as soft lighting for the workspace setting and warm lighting for the cozy living room setting? 0 points: The lighting does not match the intended ambiance, reducing the realism and atmosphere in each scene. 1 point: The lighting and ambiance align well with the descriptions, creating the appropriate mood for each respective setting.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and fourth output images of the response provided by a student. The task objective is to generate different application scenario images of a speicific product.\nThe text requirement is:\nBased on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios.\nYour review question is:\nDo the third and fourth output images convey a relevant context for the headphones’ use, such as portability for the outdoor setting in the third image and everyday style in the home setting of the fourth image? 0 points: The context does not convincingly represent suitable use environments, making the scenes feel unrelated to practical scenarios. 1 point: The scenes appropriately match the intended environments, making the headphone usage feel relevant and purposeful in each context.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/product_usage_scenario_generation_0002/eval.json b/dataset/product_usage_scenario_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5581dca072f05c816eca969fc31adfc20c41c290 --- /dev/null +++ b/dataset/product_usage_scenario_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements described in the text?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Does the headphone in the first output image match the design, shape, and color of the original headphone image, ensuring they are the same product?", + "0_point_standard": "The headphone appears different from the original image with noticeable inconsistencies in design or color.", + "1_point_standard": "The headphone in the generated image matches the original design, shape, and color, ensuring it is recognizable as the same product." + }, + { + "question": "Do the third and fourth output images accurately depict specific scene details, such as the outdoor environment (trees and sunlight) in the third image and the modern living room scene (books and coffee cup) in the fourth image?", + "0_point_standard": "The scenes lack the specifically described elements, making it difficult to recognize the described environment.", + "1_point_standard": "The scenes accurately include the specified elements, making the environment clear and consistent with the description." + }, + { + "question": "Do the second and fourth output images maintain a consistent realistic photographic style with appropriate lighting, texture, and rendering quality?", + "0_point_standard": "There are noticeable style differences between these images, reducing the visual coherence of the series.", + "1_point_standard": "Both images exhibit a consistent realistic photographic style with coherent lighting, texture, and rendering, enhancing the overall continuity." + }, + { + "question": "Is the lighting in the first and second output images consistent with the description, such as soft lighting for the workspace and warm lighting for the cozy living room?", + "0_point_standard": "The lighting does not match the expected atmosphere, reducing the realism and ambiance of each scene.", + "1_point_standard": "The lighting and ambiance are well-matched with the description, creating an appropriate atmosphere for each scene." + }, + { + "question": "Do the third and fourth output images convey an appropriate context for headphone use, such as a portable outdoor scene in the third image and a homey everyday style in the fourth image?", + "0_point_standard": "The scenes fail to convincingly showcase an appropriate usage context, making the scenes appear unrelated to the actual context.", + "1_point_standard": "The scenes appropriately match the expected context, making the headphone use appear relevant and meaningful in each setting." + } + ] +} \ No newline at end of file diff --git a/dataset/product_usage_scenario_generation_0002/images.txt b/dataset/product_usage_scenario_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..409bf8c9dd5055114e0db4ea85e5af9adb37b895 --- /dev/null +++ b/dataset/product_usage_scenario_generation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01UGTzho1wmoRBofWrZ_!!6000000006351-0-tps-1600-1535.jpg diff --git a/dataset/product_usage_scenario_generation_0002/instruction.txt b/dataset/product_usage_scenario_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..1e039bc005f0a9dffea414fce00cb3e2bf79e02e --- /dev/null +++ b/dataset/product_usage_scenario_generation_0002/instruction.txt @@ -0,0 +1 @@ +Based on the given headphone image, generate a series of scenes showing the usage of the headphones in different environments. The goal is to generate four images, with each image showing the headphones in different practical scenarios. The first image should depict the headphones placed next to a desk with an open laptop, and soft lighting in the background, creating a comfortable working environment. The second image should show the headphones resting on the armrest of a sofa, surrounded by minimalistic and warm decorations, emphasizing a relaxed and cozy atmosphere. The third image should show the headphones on an outdoor wooden table, with trees and sunlight in the background, highlighting the portability and outdoor use of the headphones. The fourth image should depict the headphones placed on modern furniture in a living room, with books and a coffee cup nearby, showcasing a blend of style and everyday use. All images should maintain consistency with the headphone design while incorporating different styles and details in each scene to reflect the variety of application scenarios. \ No newline at end of file diff --git a/dataset/product_usage_scenario_generation_0002/meta.json b/dataset/product_usage_scenario_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a01c0aefead3815e426534a7f82001d09f9ecd92 --- /dev/null +++ b/dataset/product_usage_scenario_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "product usage scenario generation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0050", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/eval.json b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..fb8bd15707a49cfee584d755b1c6bde7c21bbdf7 --- /dev/null +++ b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image contain the anime character from the input image and retain its character identity?", + "0_point_standard": "The anime character is missing, or the character's identity has significantly changed, making it unrecognizable.", + "1_point_standard": "The anime character is present and its character identity is retained without significant changes." + }, + { + "question": "Is the real-world background from the input image completely retained without significant alterations or removal of key elements?", + "0_point_standard": "The real-world background has been altered, removed, or significantly modified, disrupting the original scene.", + "1_point_standard": "The real-world background remains completely unchanged, retaining all original elements and details." + }, + { + "question": "Does the anime character naturally integrate into the specified location or context within the real-world photo (e.g., standing, sitting, interacting with specific elements)?", + "0_point_standard": "The anime character is not correctly positioned or does not interact as described, disrupting the task context.", + "1_point_standard": "The anime character is correctly positioned and interacts with specific elements in the real-world photo as expected." + }, + { + "question": "Are the lighting and shadows of the anime character consistent with the real environment in the photo?", + "0_point_standard": "The lighting or shadows of the anime character are inconsistent with the direction, intensity, or tone of the real environment, making the integration appear unnatural.", + "1_point_standard": "The lighting and shadows of the anime character seamlessly match the real environment, presenting a natural appearance." + }, + { + "question": "Are the resolution, texture, and details of the anime character visually consistent with the real-world photo, avoiding sharpness or integration issues?", + "0_point_standard": "The resolution, texture, or details of the anime character conflict with the real-world photo, causing visual inconsistency and imbalance.", + "1_point_standard": "The resolution, texture, and details of the anime character are visually consistent with the real-world photo, enhancing the realism of the integration." + }, + { + "question": "Does the overall composition maintain a harmonious balance between the stylized appearance of the anime character and the real-world photo, avoiding any element being overly jarring?", + "0_point_standard": "The stylized appearance of the anime character conflicts with the real-world photo, disrupting overall harmony and making the integration appear jarring.", + "1_point_standard": "The stylized appearance of the anime character and the real-world photo are balanced, presenting a unified and visually harmonious composition." + } + ] +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/images.txt b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..7866508c820e47d6f7d59a774a605233070299ea --- /dev/null +++ b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i1/O1CN01vV3gDp1WYuP7dD1st_!!6000000002801-0-tps-516-917.jpg +https://img.alicdn.com/imgextra/i4/O1CN01R0qdxf1pfl1gpUxpQ_!!6000000005388-0-tps-736-924.jpg diff --git a/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/instruction.txt b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..b70c0a249a165e7e8f491e5366d718c61a0ed798 --- /dev/null +++ b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/instruction.txt @@ -0,0 +1 @@ +Please generate an image that seamlessly integrates the anime character from the second image into the real-world photo from the first image. The background of the first image must remain completely unchanged, and the anime character should be standing next to the lamp post, holding the shopping bag, while maintaining their anime style and character ID. Adjust the lighting and shadows to match the real-world environment in the photo, ensuring the overall composition appears natural and cohesive. \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/meta.json b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..e119a538851a81a5245f68ab254e46755f3b818d --- /dev/null +++ b/dataset/real_and_anime_interaction_anime_character_in_real_world_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "anime character in real world", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0098", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0002/eval.json b/dataset/real_and_anime_interaction_mixed_portrait_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a4e767ed93bbb36be63f0540d004caeb87e2106d --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the specified anime-reality mixed effect accurately applied to the designated area (e.g., the left half of the character, clothing, or specific features) in the output image, while retaining the other parts of the image as instructed?", + "0_point_standard": "The mixed effect is not applied in the correct area, or other unspecified parts of the image have been altered.", + "1_point_standard": "The mixed effect is accurately applied in the designated area, with other parts remaining unchanged as instructed." + }, + { + "question": "Does the anime style applied in the output image align with the reference anime style (e.g., line quality, shading, and color scheme)?", + "0_point_standard": "There are significant deviations from the reference style, resulting in inconsistency or mismatched elements.", + "1_point_standard": "The anime style closely matches the reference style, maintaining consistency in lines, shading, and color." + }, + { + "question": "Is the boundary between the anime and realistic parts transitioned naturally, presenting a cohesive visual effect?", + "0_point_standard": "The boundary transition between the anime and realistic parts is harsh or poorly blended, making the transition appear unnatural.", + "1_point_standard": "The boundary transition is smooth and well-blended, ensuring a natural integration of anime and realistic parts." + }, + { + "question": "Do the added anime-style decorative elements (e.g., stars, ribbons, light effects) harmoniously integrate into the image, enhancing aesthetic appeal without disrupting the overall realistic effect?", + "0_point_standard": "Decorative elements are missing, poorly integrated, or disrupt the overall harmony of the image.", + "1_point_standard": "Decorative elements are well-integrated, enhancing the anime-reality mix while maintaining the original aesthetics." + }, + { + "question": "Do the transformed anime parts (e.g., clothing, hair, or specific features) remain consistent with the original posture, texture, and identity of the character in the image?", + "0_point_standard": "The transformed parts are disconnected from the character's original posture, texture, or identity, leading to a lack of coherence.", + "1_point_standard": "The transformed parts are consistent with the character's posture, texture, and identity, ensuring a coherent and realistic fusion effect." + }, + { + "question": "Does the overall composition achieve a visually appealing balance between anime and realistic styles, while maintaining the artistic goals of the task?", + "0_point_standard": "The image appears visually unbalanced, with one style (anime or realistic) being overly dominant or conflicting.", + "1_point_standard": "The image achieves a balance between anime and realistic styles that is visually appealing and aligns with the artistic goals of the task." + } + ] +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0002/images.txt b/dataset/real_and_anime_interaction_mixed_portrait_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e19cc55f3cd00ca5aacc379552562177645ef678 --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0002/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i4/O1CN01EJHwQ71UYn4PMTWPz_!!6000000002530-0-tps-736-797.jpg +https://img.alicdn.com/imgextra/i1/O1CN01rfTxbI1WJnKnSb49f_!!6000000002768-0-tps-554-831.jpg diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0002/instruction.txt b/dataset/real_and_anime_interaction_mixed_portrait_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..bccdb42f23a09ff7f98aac107d7bd750142b6a55 --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0002/instruction.txt @@ -0,0 +1 @@ +Please generate an image with the goal of applying the anime-realistic mixed effect from the first image to the clothing portion of the second image. Specifically, transform the clothing worn by the character in the second image into an anime-style effect, including details such as color, lines, and texture, while keeping the rest of the image unchanged, such as the background, the character’s hairstyle, facial expression, and pose. Additionally, you may slightly enhance the overall image with decorations that match the anime style, such as subtle lighting effects, lines, or patterns, ensuring that these decorations do not alter the original background or main subject’s realism. \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0002/meta.json b/dataset/real_and_anime_interaction_mixed_portrait_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..99bb1c539b50236f7d209793f21648dc59f4445c --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "mixed portrait", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0099", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0003/eval.json b/dataset/real_and_anime_interaction_mixed_portrait_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..d9cf6d424f93a0750421de7ce7fc7ae4f99337d7 --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image accurately apply the specified anime-realistic hybrid effect in the designated area (e.g., the left half of the character, clothing, or specific features) while keeping the rest of the image unchanged as instructed?", + "0_point_standard": "The hybrid effect is not applied in the correct area, or other unspecified parts of the image are altered.", + "1_point_standard": "The hybrid effect is accurately applied in the designated area, and other parts remain unchanged as instructed." + }, + { + "question": "Is the anime style applied in the output image consistent with the reference anime style (e.g., line quality, shading, and color scheme)?", + "0_point_standard": "There are significant deviations in the anime style from the reference style, leading to inconsistencies or mismatched elements.", + "1_point_standard": "The anime style closely matches the reference style, maintaining consistency in lines, shading, and colors." + }, + { + "question": "Is the transition between the anime and realistic parts smooth, presenting a coherent visual effect?", + "0_point_standard": "The boundary transition between the anime and realistic parts is harsh or poorly blended, causing it to appear unnatural.", + "1_point_standard": "The boundary transition is smooth and well-blended, ensuring a natural integration between the anime and realistic parts." + }, + { + "question": "Do the added anime-style decorative elements (such as stars, ribbons, light effects) harmoniously integrate into the image, enhancing its aesthetic appeal without disrupting the overall realistic effect?", + "0_point_standard": "Decorative elements are missing, poorly integrated, or disrupt the overall harmony of the image.", + "1_point_standard": "Decorative elements are well integrated, enhancing the anime-realistic hybrid effect while maintaining the original aesthetics." + }, + { + "question": "Do the transformed anime parts (e.g., clothing, hair, or specific features) remain consistent with the original pose, texture, and identity of the character in the image?", + "0_point_standard": "The transformed parts are disconnected from the character's original pose, texture, or identity, leading to a lack of coherence.", + "1_point_standard": "The transformed parts are consistent with the character's pose, texture, and identity, ensuring coherent and realistic integration." + }, + { + "question": "Does the overall composition achieve a visually compelling balance between anime and realistic styles while maintaining the artistic goals of the task?", + "0_point_standard": "The image appears visually unbalanced, with either the anime or realistic style being overly dominant or conflicting.", + "1_point_standard": "The image achieves a balance and strong visual appeal between anime and realistic styles, aligning with the artistic goals of the task." + } + ] +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0003/images.txt b/dataset/real_and_anime_interaction_mixed_portrait_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..942cba844d6d5946e19ace1527dc4838443c1890 --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0003/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01YJetIl1edVIvwwAto_!!6000000003894-0-tps-564-846.jpg +https://img.alicdn.com/imgextra/i1/O1CN01I7sHKv1hZmwD9fvxK_!!6000000004292-0-tps-236-419.jpg diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0003/instruction.txt b/dataset/real_and_anime_interaction_mixed_portrait_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..8f421bd9a36d060806c009bdac988bfa4a9ea4ff --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0003/instruction.txt @@ -0,0 +1 @@ +Please generate an image that applies the visual effect of anime and reality blending from the first image to the second image. Specifically, incorporate the vibrant, twisted ribbon-like design and partial anime-styled features from the first image naturally into the second image, while keeping the background and main content of the second image intact. The character should maintain their original pose, but add anime-inspired blending effects to the clothing, hair, or specific areas, creating a unique artistic style. \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_mixed_portrait_0003/meta.json b/dataset/real_and_anime_interaction_mixed_portrait_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..63d8c4a0169a9843a1eac9a422b101ef5b6ae8a0 --- /dev/null +++ b/dataset/real_and_anime_interaction_mixed_portrait_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "mixed portrait", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0099", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/eval.json b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a45f364ceb2c1d5fd8125069f2e9c56e41a18c3c --- /dev/null +++ b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image accurately position the real-life character in the anime-style background location specified in the text prompt?", + "0_point_standard": "The real-life character is not positioned in the specified location within the anime-style background, or is positioned inaccurately.", + "1_point_standard": "The real-life character is correctly positioned in the specified location within the anime-style background." + }, + { + "question": "Is the anime-style background completely preserved without unexpected changes to its elements or visual composition?", + "0_point_standard": "There are noticeable alterations or disruptions to the anime-style background, deviating from the original input image.", + "1_point_standard": "The anime-style background is fully preserved as specified, with no unexpected changes to its elements or composition." + }, + { + "question": "Does the real-life character seamlessly blend with the anime-style background in terms of lighting, shadow alignment, and spatial consistency?", + "0_point_standard": "The blending of the real-life character with the anime-style background is poor, with inconsistencies in lighting, shadows, or spatial alignment.", + "1_point_standard": "The real-life character naturally blends into the anime-style background, with good alignment of lighting, shadows, and space." + }, + { + "question": "Does the real-life character retain its realistic photographic style while maintaining identity consistency (e.g., facial features, pose, and clothing) as described in the prompt?", + "0_point_standard": "The realistic style of the real-life character has been altered, or its identity characteristics (e.g., facial features, pose, or clothing) are inconsistent with the original input image.", + "1_point_standard": "The real-life character retains its realistic style and identity characteristics, with all defining features consistent with the original image." + }, + { + "question": "Is the overall composition visually harmonious, with the real-life character and anime-style background naturally balanced, avoiding visual conflicts?", + "0_point_standard": "The overall composition is visually unbalanced, with conflicts or disharmony between the real-life character and the anime-style background.", + "1_point_standard": "The overall composition is harmonious, with the real-life character and anime-style background complementing each other, resulting in a good visual effect." + }, + { + "question": "Do adjustments to the real-life character's pose or expression enhance its natural integration into the anime-style scene?", + "0_point_standard": "Adjustments to the real-life character's pose or expression appear unnatural and fail to enhance its integration into the scene.", + "1_point_standard": "Adjustments to the real-life character's pose or expression improve its natural fit with the anime-style background, enhancing the overall coherence of the scene." + } + ] +} \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/images.txt b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..aa05ced04f6a52f44e54124702b2be3b8245c291 --- /dev/null +++ b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01hb0jY51FI6P08WVtb_!!6000000000463-0-tps-736-1308.jpg +https://img.alicdn.com/imgextra/i3/O1CN01Bx2qW01puPhW3kedZ_!!6000000005420-0-tps-736-1154.jpg diff --git a/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/instruction.txt b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..1e25e53a4dce27aae46dbb64b2fc1d875b37c84f --- /dev/null +++ b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/instruction.txt @@ -0,0 +1 @@ +Please generate an image that naturally integrates the real-life figure from the second image into the anime-style background of the first image. The other elements of the anime background should remain unchanged. The real person should naturally stand on the crosswalk, and her expression or posture can be slightly adjusted, but her real-life style and ID must remain consistent, and no changes to her defining characteristics should be made. Ensure the overall image is harmonious, with the real person blending seamlessly into the anime-style background. \ No newline at end of file diff --git a/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/meta.json b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..42d9323e854981272ea6bb69d065a1e41f0177a7 --- /dev/null +++ b/dataset/real_and_anime_interaction_real_person_in_anime_background_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "real person in anime background", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0100", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/same_pose_generation_0002/eval.json b/dataset/same_pose_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..3526000df1e51d25bc1f2b5eff8f380b0a929c0e --- /dev/null +++ b/dataset/same_pose_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image ensure that each character maintains a strong visual connection with the original character definition chart?", + "0_point_standard": "Characters in the generated image do not resemble the original characters in key features or identifying elements.", + "1_point_standard": "Characters in the generated image closely match the unique features and style of the original character definition chart." + }, + { + "question": "Are the poses of the characters in all generated images consistent with those specified by the task?", + "0_point_standard": "Characters are depicted in different poses, failing to adhere to the consistent pose requirement.", + "1_point_standard": "All characters are depicted in the same specified pose, ensuring consistency across images." + }, + { + "question": "Does the model accurately follow the specific instructions regarding character attributes (e.g., clothing or accessories) from the text description?", + "0_point_standard": "The model fails to incorporate the specified attributes from the text description, or the attributes are inaccurately represented.", + "1_point_standard": "The model accurately integrates the specified attributes from the text description into each character image." + }, + { + "question": "Has any part of the character image been unnecessarily changed or distorted beyond the specified modifications?", + "0_point_standard": "Unnecessary changes or distortions are present, affecting parts of the character not intended to be modified.", + "1_point_standard": "Only the specified modifications are made, with no unnecessary changes to other parts of the image." + }, + { + "question": "Do the generated images maintain a high level of aesthetic quality and visual appeal?", + "0_point_standard": "Character images lack visual appeal and are poor in detail and composition.", + "1_point_standard": "Character images are visually appealing, with high-quality details and attractive composition." + }, + { + "question": "Are the character images stylistically consistent with each other, forming a cohesive and unified set?", + "0_point_standard": "There are noticeable stylistic inconsistencies between character images, disrupting the visual harmony of the set.", + "1_point_standard": "Character images are stylistically consistent, creating a cohesive and unified visual presentation." + } + ] +} \ No newline at end of file diff --git a/dataset/same_pose_generation_0002/images.txt b/dataset/same_pose_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..0962ce1b846aabf8ae6b90994c0dec7612ad7a60 --- /dev/null +++ b/dataset/same_pose_generation_0002/images.txt @@ -0,0 +1,5 @@ +https://img.alicdn.com/imgextra/i3/O1CN01GGSJ3B1YWHMV7WoKU_!!6000000003066-0-tps-3106-2027.jpg +https://img.alicdn.com/imgextra/i4/O1CN01tQIaPB1JQaCQOXEtd_!!6000000001023-0-tps-1732-1498.jpg +https://img.alicdn.com/imgextra/i2/O1CN01FqfgDI1cmXtJgINUn_!!6000000003643-0-tps-1280-811.jpg +https://img.alicdn.com/imgextra/i3/O1CN01L2GAr91nhTHDuzwdL_!!6000000005121-0-tps-2480-3406.jpg +https://img.alicdn.com/imgextra/i3/O1CN01yYRFVP24IbEjxNlYC_!!6000000007368-0-tps-1997-2882.jpg diff --git a/dataset/same_pose_generation_0002/instruction.txt b/dataset/same_pose_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f41e434a90e0b5a46dd5f427c7351b74c6303e1a --- /dev/null +++ b/dataset/same_pose_generation_0002/instruction.txt @@ -0,0 +1 @@ +Please generate five new images based on the given five character images, with each character performing the same “jumping” action. The model's goal is to output five images, with each image corresponding to one character. The characters should maintain consistent jumping postures and action details. The outfits and appearance of the characters remain the same, and the background can either stay the same or be slightly adjusted. Items in the characters' hands can be added or removed based on the needs of the jumping action. \ No newline at end of file diff --git a/dataset/same_pose_generation_0002/meta.json b/dataset/same_pose_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..ad1551f815fe28bfc73c6e0afbd28b17402786fd --- /dev/null +++ b/dataset/same_pose_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "dynamic characters with same pose", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0034", + "output_image_count": 5, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/sculpture_generation_0002/eval.json b/dataset/sculpture_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c04f1a80e76e2080733539e380cbb71ea954e51a --- /dev/null +++ b/dataset/sculpture_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the sculpture's form match the text description, and are the overall structure and posture accurate?", + "0_point_standard": "The sculpture's form does not match the description, with noticeable deviations or omissions in structure or posture.", + "1_point_standard": "The sculpture's form matches the description, with accurate structure and posture." + }, + { + "question": "Does the generated sculpture image have a clear 3D effect that meets the spatial requirements of a sculpture?", + "0_point_standard": "The image lacks a 3D effect, insufficient spatial depth, and appears flat.", + "1_point_standard": "The image has a strong 3D effect, showcasing depth and meeting the spatial requirements of a sculpture." + }, + { + "question": "Does the texture representation of the sculpture match the material requirements described in the text (e.g., smooth, rough, metallic)?", + "0_point_standard": "The texture does not match the text description, with poor detail representation and a lack of material realism.", + "1_point_standard": "The texture matches the text description, with rich details and realistic material texture." + }, + { + "question": "Does the model accurately implement the specific details pointed out in the text (e.g., texture, ornaments, or specific decorative elements)?", + "0_point_standard": "The image lacks or misunderstands the specific details specified in the text, resulting in inaccurate representation.", + "1_point_standard": "The image accurately represents all specified details, with fine and natural design." + }, + { + "question": "Does the style and visual effect of the sculpture match the description in the text (e.g., modern, classical, abstract)?", + "0_point_standard": "The style deviates significantly from the text description, failing to convey the specified style.", + "1_point_standard": "The sculpture's style matches the text description, with the expected visual effect achieved." + }, + { + "question": "Does the overall aesthetic quality of the sculpture image reach a professional standard of sculpture design, with strong visual impact?", + "0_point_standard": "The sculpture image lacks aesthetic appeal, with insufficient visual impact and lacks design sense.", + "1_point_standard": "The sculpture image has excellent aesthetic quality, strong visual impact, reaching a professional design standard." + } + ] +} \ No newline at end of file diff --git a/dataset/sculpture_generation_0002/images.txt b/dataset/sculpture_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/sculpture_generation_0002/instruction.txt b/dataset/sculpture_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f6b95fde20cd6afc893709a3b6802feced7d5878 --- /dev/null +++ b/dataset/sculpture_generation_0002/instruction.txt @@ -0,0 +1 @@ +This image depicts an intricate sculpture of a powerful, monkey-like bust, exuding a sense of strength and authority. The sculpture's monkey figure is shown with its head bowed slightly, wearing an intense, focused expression with furrowed brows and sharp, piercing eyes. A golden circlet wraps around the monkey's forehead, reminiscent of Sun Wukong from the classic “Journey to the West.” The detailing of the sculpture is highly realistic, with the skin's muscle textures clearly visible, giving off a strong, muscular appearance. The chest and abdominal muscles are taut, and the skin is covered with short fur in a brown-yellow tone, adding to the rugged texture. Draped over the right shoulder is a piece of black, tattered cloth. The arms and shoulders are muscular, and both hands appear to be immersed in the red, viscous material at the base, resembling molten flesh, creating a striking and eerie visual impact. The base of the sculpture is a black circular platform, and the overall design is full of tension and power. \ No newline at end of file diff --git a/dataset/sculpture_generation_0002/meta.json b/dataset/sculpture_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a729afccee319d05119e4aa87b79335f684fa250 --- /dev/null +++ b/dataset/sculpture_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "sculpture generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0029", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/special_effect_adding_0002/auto_eval.jsonl b/dataset/special_effect_adding_0002/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..017ddfd573312209d36ec61dfa11cbd04afc9129 --- /dev/null +++ b/dataset/special_effect_adding_0002/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nDoes the output image feature translucent, angular shapes in shades that complement the background color, similar to the effect in the second image? 0 points: The output image does not feature translucent, angular shapes, or the colors do not complement the background. 1 point: The output image includes translucent, angular shapes in shades that complement the background color, mirroring the effect in the second image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nDoes the output image maintain a similar dynamic and layered look created by the geometric shards, as seen in the second image? 0 points: The output image lacks a dynamic, layered look, and the geometric shards do not blend or layer effectively. 1 point: The output image effectively replicates a dynamic and layered appearance using the geometric shards, matching the style of the second image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nAre the subject’s primary elements, including facial features, expression, hairstyle, and pose, preserved in the output image without distortion? 0 points: The subject’s facial features, expression, hairstyle, or pose are altered or obscured by the effect. 1 point: The subject’s primary elements, including facial features, expression, hairstyle, and pose, remain intact and clear, without distortion due to the added effect.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nDoes the output image’s application of geometric shapes mirror the second image’s level of opacity and blending with the background? 0 points: The opacity and blending of the geometric shapes in the output image do not match the second image’s style, appearing either too opaque or not blended well. 1 point: The opacity and blending of the geometric shapes in the output image closely mirror the effect in the second image, creating a seamless integration with the background.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nDoes the output image apply geometric shapes in a way that is consistent with the positioning in the second image (e.g., shapes surrounding the subject without covering critical features)? 0 points: The geometric shapes are positioned inconsistently, covering critical features or not surrounding the subject effectively. 1 point: The geometric shapes are positioned consistently with the second image, surrounding the subject without obstructing key features.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nThe second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details.\nYour review question is:\nIs the overall composition and balance of the output image similar to that of the second image, maintaining a harmonious visual aesthetic? 0 points: The composition and balance are disrupted, creating a visually unappealing or unbalanced image. 1 point: The output image maintains a balanced and harmonious composition, with the effect applied in a way that enhances the visual appeal, similar to the second image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/special_effect_adding_0002/eval.json b/dataset/special_effect_adding_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..3c5971e311b0545867924598b0ed60b3c8ea102f --- /dev/null +++ b/dataset/special_effect_adding_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image contain semi-transparent, complementary-colored angular shapes similar to the effect in the second image?", + "0_point_standard": "The output image does not have semi-transparent angular shapes, or the colors are not complementary to the background.", + "1_point_standard": "The output image includes semi-transparent angular shapes with colors complementary to the background, similar to the effect in the second image." + }, + { + "question": "Does the output image maintain a dynamic layered appearance created by geometric fragments, similar to the effect in the second image?", + "0_point_standard": "The output image lacks a dynamic layered appearance; the geometric fragments do not effectively blend or layer.", + "1_point_standard": "The output image effectively replicates a dynamic layered appearance using geometric fragments, matching the style of the second image." + }, + { + "question": "Are the main elements of the subject in the output image (including facial features, expressions, hairstyle, and posture) preserved without distortion?", + "0_point_standard": "The subject's facial features, expressions, hairstyle, or posture are altered or obscured by effects.", + "1_point_standard": "The main elements of the subject (including facial features, expressions, hairstyle, and posture) remain intact and clear, without distortion from added effects." + }, + { + "question": "Is the transparency and blending effect of geometric shapes in the output image consistent with the second image?", + "0_point_standard": "The transparency and blending effect of geometric shapes in the output image do not match the second image, appearing too opaque or not well-integrated.", + "1_point_standard": "The transparency and blending effect of the geometric shapes in the output image closely resemble the effect in the second image, seamlessly integrating with the background." + }, + { + "question": "Is the application of geometric shapes in the output image consistent with their placement in the second image (e.g., shapes surround the subject without blocking key features)?", + "0_point_standard": "The placement of geometric shapes is inconsistent, blocking key features or not effectively surrounding the subject.", + "1_point_standard": "The placement of geometric shapes is consistent with the second image, surrounding the subject without blocking key features." + }, + { + "question": "Is the overall composition and balance of the output image similar to the second image, maintaining a harmonious visual aesthetic?", + "0_point_standard": "The composition and balance are disrupted, making the image visually unappealing or unbalanced.", + "1_point_standard": "The output image maintains balanced and harmonious composition, with the application of effects enhancing visual appeal, similar to the second image." + } + ] +} \ No newline at end of file diff --git a/dataset/special_effect_adding_0002/images.txt b/dataset/special_effect_adding_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..b7e845cebfdaab50e1f84defe4176ea07b888cd8 --- /dev/null +++ b/dataset/special_effect_adding_0002/images.txt @@ -0,0 +1,3 @@ +https://img.alicdn.com/imgextra/i1/O1CN01jwVe3b1VPv0vbs871_!!6000000002646-0-tps-1430-847.jpg +https://img.alicdn.com/imgextra/i3/O1CN01w2dNiT1ESLe9uoJtQ_!!6000000000350-0-tps-1430-821.jpg +https://img.alicdn.com/imgextra/i1/O1CN01OGnzN41IlowvxGBWE_!!6000000000934-0-tps-1429-848.jpg diff --git a/dataset/special_effect_adding_0002/instruction.txt b/dataset/special_effect_adding_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..851a3a2b4f7e572a59ae61b50c04d39080c78b34 --- /dev/null +++ b/dataset/special_effect_adding_0002/instruction.txt @@ -0,0 +1 @@ +The second image adds a unique artistic effect to the original first image, incorporating abstract geometric shards that blend with the background and subtly overlay the subject. Please apply this same effect to the third image. This effect should include translucent, angular shapes in shades that complement the background color, creating a dynamic and layered look. Ensure that all primary elements of the third image, including the subject’s facial features, expression, hairstyle, and pose, remain unchanged. The added effect should enhance the image in a way that mirrors the transformation from the first to the second image, while maintaining the original composition and details. \ No newline at end of file diff --git a/dataset/special_effect_adding_0002/meta.json b/dataset/special_effect_adding_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..24380464e67c9f164647c2248c8a1ec22f84be3f --- /dev/null +++ b/dataset/special_effect_adding_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "special effect adding", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0085", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/special_effect_adding_0003/auto_eval.jsonl b/dataset/special_effect_adding_0003/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..c98e6269ef9646360daac598df95eea419096838 --- /dev/null +++ b/dataset/special_effect_adding_0003/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nDoes the output image have a textured, paper-like background similar to the second input image? 0 Points: The output image does not include a textured, paper-like background or only vaguely resembles it without noticeable texture. 1 Point: The output image clearly shows a textured, paper-like background, consistent with the effect seen in the second image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nIs there a consistent watercolor brushstroke effect in the output image that matches the second input image’s style? 0 Points: The output image lacks clear watercolor brushstroke details or has inconsistently applied strokes that do not match the style of the second image. 1 Point: The output image has consistent watercolor brushstroke details, similar to those in the second image, creating a cohesive painterly effect.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nAre the primary features of the subject, including expression and pose, preserved in the output image? 0 Points: The output image alters the subject’s primary features, such as expression or pose, making it different from the original third image. 1 Point: The output image maintains all primary features of the subject, including expression and pose, matching the original third image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nIs there soft color blending in the output image, giving it a natural watercolor appearance as seen in the second input image? 0 Points: The colors in the output image are harsh or not blended, lacking the softness typical of watercolor. 1 Point: The output image displays soft color blending, consistent with the watercolor style seen in the second input image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0003.jpg"], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third input image and output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nDoes the output image retain the details of the surrounding flowers from the third input image while applying the watercolor effect? 0 Points: The details of the surrounding flowers are lost or significantly altered in the output image, deviating from the original third image. 1 Point: The output image retains the surrounding flowers’ details, with the watercolor effect seamlessly applied without losing the original elements.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0001.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is output image of the response provided by a student. The task objective is to add a special effect to the third input image, similar to the effect change from the first input image to the second input image.. \nThe text requirement is:\nApply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand.\nYour review question is:\nIs the watercolor effect applied uniformly across the entire image, without isolated areas that look untouched or overly exaggerated? 0 Points: The watercolor effect is inconsistent, with some areas looking either overly sharp or too blurred, breaking the uniform appearance. 1 Point: The watercolor effect is uniformly applied, maintaining a cohesive style throughout the image.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/special_effect_adding_0003/eval.json b/dataset/special_effect_adding_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b0199bdf6eb361942bd0d532344321bc299ebbad --- /dev/null +++ b/dataset/special_effect_adding_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output image have a textured paper background similar to the second input image?", + "0_point_standard": "The output image does not contain a textured paper background, or only vaguely shows the effect without distinct texture.", + "1_point_standard": "The output image clearly displays a textured paper background consistent with the effect in the second image." + }, + { + "question": "Does the output image have watercolor brushstroke effects consistent with the style of the second input image?", + "0_point_standard": "The output image lacks clear watercolor brushstroke details, or the strokes are applied inconsistently, not matching the style of the second image.", + "1_point_standard": "The output image has consistent watercolor brushstroke details similar to those in the second image, creating a unified painting effect." + }, + { + "question": "Does the output image retain the main features of the subject, including expression and posture?", + "0_point_standard": "The output image alters the main features of the subject, such as expression or posture, making it different from the original third image.", + "1_point_standard": "The output image retains all the main features of the subject, including expression and posture, consistent with the original third image." + }, + { + "question": "Does the output image have smooth color transitions, giving it a natural watercolor appearance, as seen in the second input image?", + "0_point_standard": "The color transitions in the output image are harsh or lack smoothness, not representing typical watercolor softness.", + "1_point_standard": "The output image shows smooth color transitions consistent with the watercolor style of the second image." + }, + { + "question": "While applying the watercolor effect, does the output image retain the details of the surrounding flowers from the third input image?", + "0_point_standard": "The details of the surrounding flowers are lost or significantly altered in the output image, deviating from the original third image.", + "1_point_standard": "The output image retains the details of the surrounding flowers, with the watercolor effect naturally applied without losing the original elements." + }, + { + "question": "Is the watercolor effect applied evenly throughout the image, without isolated untreated or overly exaggerated areas?", + "0_point_standard": "The watercolor effect is inconsistent, with some areas appearing too sharp or blurry, disrupting a unified appearance.", + "1_point_standard": "The watercolor effect is applied evenly throughout the image, maintaining consistency in the overall style of the image." + } + ] +} \ No newline at end of file diff --git a/dataset/special_effect_adding_0003/images.txt b/dataset/special_effect_adding_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..31e23c3618fe5ed85f3906af8975df90202ce2ca --- /dev/null +++ b/dataset/special_effect_adding_0003/images.txt @@ -0,0 +1,3 @@ +https://img.alicdn.com/imgextra/i1/O1CN01zSqABW1c2HtALn0so_!!6000000003542-0-tps-1432-847.jpg +https://img.alicdn.com/imgextra/i2/O1CN01WdeVuP1iCGCxGTLJe_!!6000000004376-0-tps-1430-821.jpg +https://img.alicdn.com/imgextra/i2/O1CN01etnJOs20rAfaYhTjI_!!6000000006902-0-tps-1431-847.jpg diff --git a/dataset/special_effect_adding_0003/instruction.txt b/dataset/special_effect_adding_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..7dd4d257b0a8b13d01e28944e25fe54875040379 --- /dev/null +++ b/dataset/special_effect_adding_0003/instruction.txt @@ -0,0 +1 @@ +Apply the same watercolor artistic effect as seen in the second image to the third image. Ensure the effect includes a textured, paper-like background with brushstroke details and soft color blending, giving it a classic, painterly look. Maintain all the original elements of the third image, including the subject’s expression, pose, and surrounding flowers, while integrating the watercolor effect seamlessly to make it appear as though the image was painted by hand. \ No newline at end of file diff --git a/dataset/special_effect_adding_0003/meta.json b/dataset/special_effect_adding_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1781a71479c704929d3484f856a21bae055af026 --- /dev/null +++ b/dataset/special_effect_adding_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "special effect adding", + "num_of_cases": 3, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0085", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/stop-motion_animation_generation_0001/auto_eval.jsonl b/dataset/stop-motion_animation_generation_0001/auto_eval.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..ead44d423b8691fad25a72219ea2f86e4c8ccf83 --- /dev/null +++ b/dataset/stop-motion_animation_generation_0001/auto_eval.jsonl @@ -0,0 +1,6 @@ +{"input_images": [], "output_images": ["0001.jpg", "0002.jpg", "0003.jpg", "0004.jpg"], "question": "Is the number in the image the digit 4? 0 points: The number in the image is not the digit 4; 1 point: The number in the image is the digit 4. \nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0002.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first Input Image and second Interpolated Image of the response provided by a student. The task objective is to interpolated frames for the given key frames.\nThe text requirement is:\nInsert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames.\nYour review question is:\nDo the first and third images show a logical progression in the transformation of the cookie, with clear, gradual changes in the cookie’s color and snow coverage? 0 points: The transition between the first and third images lacks smooth progression, making the stages of transformation appear inconsistent or abrupt. 1 point: The transition between the first and third images is smooth and logical, showing a gradual, continuous transformation of the cookie.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0003.jpg", "0004.jpg"], "question": "YYou are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third and final Interpolated Images of the response provided by a student. The task objective is to interpolated frames for the given key frames.\nThe text requirement is:\nInsert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames.\nYour review question is:\nDo the third and final interpolated images maintain consistency in the snowflake distribution, with a logical increase in coverage as the cookie further blends into the snow? 0 points: The snowflake distribution is inconsistent, with sudden or unnatural increases that disrupt the visual continuity of the scene. 1 point: The snowflake distribution shows a gradual and consistent increase, aligning with the cookie’s ongoing blending into the snow.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0001.jpg"], "output_images": ["0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the first input image and final interpolated image of the response provided by a student. The task objective is to interpolated frames for the given key frames.\nThe text requirement is:\nInsert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames.\nYour review question is:\nIs the lighting in the first input image consistent with the final interpolated image, with smooth, realistic adjustments that maintain a natural transition in illumination across the series? 0 points: The lighting changes are inconsistent or unrealistic, causing a noticeable disjunction in the visual continuity. 1 point: The lighting is consistent across images, with smooth adjustments that create a cohesive, natural transition.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": ["0002.jpg"], "output_images": ["0003.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the third interpolated image and final input image of the response provided by a student. The task objective is to interpolated frames for the given key frames.\nThe text requirement is:\nInsert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames.\nYour review question is:\nDoes the transformation of the cookie in the third interpolated image logically lead into the final input image, with gradual color blending and near-total merging into the snow? 0 points: The transformation is either too abrupt or insufficient, failing to logically lead into the final state where the cookie has fully merged into the snow. 1 point: The transformation is smooth, showing the cookie nearly blended in the third interpolated image, which naturally leads to the final state.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} +{"input_images": [], "output_images": ["0002.jpg", "0004.jpg"], "question": "You are a professional image designer, and you are now required to conduct a strict evaluation of the following design work. This is the second and fourth interpolated images of the response provided by a student. The task objective is to interpolated frames for the given key frames.\nThe text requirement is:\nInsert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames.\nYour review question is:\nDo the second and fourth interpolated images maintain a consistent visual style, including line quality, shading, and overall aesthetic? 0 points: The visual style changes noticeably between the two interpolated images, disrupting the continuity of the series. 1 point: The visual style is consistent across both images, ensuring a cohesive look throughout the interpolated frames.\nUse this JSON schema:\nEvaluation = {'score': int, 'reason': str}"} diff --git a/dataset/stop-motion_animation_generation_0001/eval.json b/dataset/stop-motion_animation_generation_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b33f7ad141252ab55aece68324192ad63317398e --- /dev/null +++ b/dataset/stop-motion_animation_generation_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the number of output images meet the requirements of the text description?", + "0_point_standard": "The number of output images does not meet the requirements.", + "1_point_standard": "The number of output images meets the requirements." + }, + { + "question": "Do the first and third images show a reasonable progression of the cookie's transformation, with the color and snow coverage clearly and gradually changing?", + "0_point_standard": "The change between the first and third images lacks smooth progression, making the transformation stages appear inconsistent or abrupt.", + "1_point_standard": "The change between the first and third images is smooth and reasonable, showing a gradual, continuous transformation of the cookie." + }, + { + "question": "Do the third and final interpolated images maintain consistency in snowflake distribution, with a reasonable increase in coverage as the cookie further merges into the snow?", + "0_point_standard": "The snowflake distribution is inconsistent, with an increase in coverage that is sudden or unnatural, disrupting the visual coherence of the scene.", + "1_point_standard": "The snowflake distribution shows a gradual, consistent increase, aligning with the cookie's continued merging into the snow." + }, + { + "question": "Is the lighting in the first input image consistent with the final interpolated image, with smooth and realistic adjustments maintaining a natural lighting transition?", + "0_point_standard": "The lighting changes are inconsistent or unrealistic, leading to a noticeable disruption in visual coherence.", + "1_point_standard": "The lighting remains consistent between images, with smooth adjustments creating a natural transition effect." + }, + { + "question": "Does the transformation of the cookie in the third interpolated image reasonably lead to the final input image, with the color gradually merging and approaching a state of full integration into the snow?", + "0_point_standard": "The transformation is too abrupt or insufficient, failing to reasonably transition to the cookie's fully integrated final state in the snow.", + "1_point_standard": "The transformation is smooth, showing the cookie nearly fully integrated in the third interpolated image, naturally transitioning to its final state." + }, + { + "question": "Do the second and fourth interpolated images maintain consistency in visual style, including line quality, shading, and overall aesthetic?", + "0_point_standard": "The visual styles of the two interpolated images are noticeably different, disrupting the continuity of the series.", + "1_point_standard": "The visual styles of the two images are consistent, ensuring overall coherence between interpolated frames." + } + ] +} \ No newline at end of file diff --git a/dataset/stop-motion_animation_generation_0001/images.txt b/dataset/stop-motion_animation_generation_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..91305b8801f0b30f333f08abaa5ae6dfacc145ff --- /dev/null +++ b/dataset/stop-motion_animation_generation_0001/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN01X69yIR25N0fM8BXhH_!!6000000007513-0-tps-1280-720.jpg +https://img.alicdn.com/imgextra/i4/O1CN01VYNsCB1jjWaae6BWS_!!6000000004584-0-tps-1280-720.jpg diff --git a/dataset/stop-motion_animation_generation_0001/instruction.txt b/dataset/stop-motion_animation_generation_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..bcda04b4f7ae835bf349a3fb8bd6b5ab787685b9 --- /dev/null +++ b/dataset/stop-motion_animation_generation_0001/instruction.txt @@ -0,0 +1 @@ +Insert 4 frames between two given keyframes, generating one image for each frame. The goal is to maintain scene continuity and consistency. The first frame shows the cookie just landing on the pristine snow, the cookie is black, and the sunlight gently illuminates the distant mountains with a few sparkling snowflakes falling around. In the second frame, the cookie begins to sink slowly, with more snowflakes falling, and the surface of the cookie starts getting thinly covered by snow, gradually transitioning from black to white. In the third frame, the cookie's color lightens further, almost blending into the surrounding snow, while the sunlight shifts slightly, and the distant mountains become shrouded in light mist. In the fourth frame, the cookie has fully turned the color of snow, completely blending into the snow with only a faint outline visible. In the final frame, the cookie has fully merged into the snow without leaving any imprint, and snowflakes continue to fall gently. Each inserted frame should ensure consistency and continuity in terms of lighting, snowflake distribution, and the transformation process of the cookie with the preceding and following frames. \ No newline at end of file diff --git a/dataset/stop-motion_animation_generation_0001/meta.json b/dataset/stop-motion_animation_generation_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b1bb62dcd9d7223e0f9d245c5bb208ed3fb059ee --- /dev/null +++ b/dataset/stop-motion_animation_generation_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "stop-motion animation generation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": true, + "uid": "0049", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/style_editing_era_editing_0002/eval.json b/dataset/style_editing_era_editing_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..ea0da6ad8ab312ee528bb5837de8d71e8f57c24c --- /dev/null +++ b/dataset/style_editing_era_editing_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the generated image retain the core theme and recognizable features of the original image, ensuring that the main content and identity are preserved?", + "0_point_standard": "The core theme or main features have changed, making the original content difficult to recognize.", + "1_point_standard": "The main content and identity of the original image are preserved, maintaining the recognizability of the theme or scene in the new era style." + }, + { + "question": "Does the style transformation accurately reflect the specified historical period, with corresponding characteristics and details?", + "0_point_standard": "The style transformation does not match the specified era, showing inconsistencies or lacking key elements of that era.", + "1_point_standard": "The era style is accurately applied, with distinct elements and characteristics clearly representing the specified historical period." + }, + { + "question": "Does the added era style smoothly integrate with the original image composition, avoiding any abrupt transitions or mismatched areas?", + "0_point_standard": "The era style is applied unevenly or has abrupt transitions, making some parts of the image appear inconsistent.", + "1_point_standard": "The era style smoothly integrates into the entire image without abrupt transitions, creating a cohesive and unified appearance." + }, + { + "question": "Are any specific era items or accessories, such as furniture, decorations, or clothing, consistent with the specified era?", + "0_point_standard": "The specific era items or accessories appear inaccurate or misplaced, compromising the historical authenticity of the image.", + "1_point_standard": "The choice of specific era items or accessories is appropriate and consistent with the specified era, enhancing the historical feel of the image." + }, + { + "question": "Do the lighting and color adjustments align with the era style, creating an authentic atmosphere without disrupting the original content?", + "0_point_standard": "The lighting and color adjustments do not match the era style or are overpowering the main content, making it look inconsistent or unnatural.", + "1_point_standard": "The lighting and colors are well-suited to the specified era style, enhancing the atmosphere while maintaining the visual coherence of the main content." + }, + { + "question": "Does the final image cohesively and aesthetically appealingly integrate the era style and content, achieving a balanced and harmonious appearance?", + "0_point_standard": "The image lacks aesthetic cohesion, with elements appearing disjointed or detracting from visual appeal.", + "1_point_standard": "The image is visually cohesive, with the era style and original content harmoniously blending into an aesthetically pleasing composition." + } + ] +} \ No newline at end of file diff --git a/dataset/style_editing_era_editing_0002/images.txt b/dataset/style_editing_era_editing_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..ff2373a59e6cedcefce39ca364861d2b5fa55c7d --- /dev/null +++ b/dataset/style_editing_era_editing_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01i719kr1etXAL38d1n_!!6000000003929-0-tps-3000-2002.jpg diff --git a/dataset/style_editing_era_editing_0002/instruction.txt b/dataset/style_editing_era_editing_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f8242c84ac036a310e959ed37f975073f0081339 --- /dev/null +++ b/dataset/style_editing_era_editing_0002/instruction.txt @@ -0,0 +1 @@ +Convert this image of seaside buildings into a mid-1960s style. \ No newline at end of file diff --git a/dataset/style_editing_era_editing_0002/meta.json b/dataset/style_editing_era_editing_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3b4999df9bd99a85881eed057862a754ff19a94d --- /dev/null +++ b/dataset/style_editing_era_editing_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "era editing", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0060", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_abstract_0001/eval.json b/dataset/style_group_generation_abstract_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..a90d32e7b1f4468dfc238c42ed55cb1fa73fc070 --- /dev/null +++ b/dataset/style_group_generation_abstract_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does each image adhere to the same abstract style in terms of abstract methods, levels of abstraction, and overall aesthetic approach?", + "0_point_standard": "The images display different abstract styles, resulting in an inconsistent visual theme.", + "1_point_standard": "All images maintain a consistent abstract style, creating a unified aesthetic throughout the series." + }, + { + "question": "Do the generated images align with the content and themes outlined in the text description?", + "0_point_standard": "The images deviate significantly from the instructions or themes specified in the text description.", + "1_point_standard": "The images accurately reflect the themes or instructions specified in the text description." + }, + { + "question": "Is the theme or central concept consistent across all images, forming a recognizable narrative or thematic connection?", + "0_point_standard": "The theme or concept is unclear or inconsistent between images, disrupting the narrative or thematic continuity.", + "1_point_standard": "The theme or concept is consistently represented across the images, creating a recognizable connection between them." + }, + { + "question": "Are key objects, characters, or abstract forms recognizable throughout the series, even in an abstract context?", + "0_point_standard": "Key elements are difficult to consistently identify between images, losing continuity of identity.", + "1_point_standard": "Key objects, characters, or forms are recognizable across images, maintaining continuity of identity in an abstract style." + }, + { + "question": "Is the level of abstraction and technique consistently applied across all images, ensuring a unified approach to abstract style?", + "0_point_standard": "There are significant variations in the level of abstraction or technique between images, disrupting the visual cohesion of the series.", + "1_point_standard": "All images exhibit a consistent level of abstraction and technique, maintaining a unified approach to abstract style." + }, + { + "question": "Do the images exhibit high aesthetic quality with detailed abstract elements, visual coherence, and professional completion?", + "0_point_standard": "The images lack detail, visual coherence, or aesthetic appeal, failing to meet professional standards.", + "1_point_standard": "The images are rich in detail, visually appealing, and exhibit professional quality, contributing to a compelling abstract series." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_abstract_0001/images.txt b/dataset/style_group_generation_abstract_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_abstract_0001/instruction.txt b/dataset/style_group_generation_abstract_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..abcded9dee968989e788b270ed07052a9a4a29e5 --- /dev/null +++ b/dataset/style_group_generation_abstract_0001/instruction.txt @@ -0,0 +1 @@ +Minimalist line style, using soft and smooth black lines to outline the subject. Each painting consists of only a few strokes with large areas of blank space, emphasizing simplicity and elegance. Generate 4 images with the subjects being a crane, a swan, a cat, and a dog. Ensure the line style is consistent across all images, with the subject's contour being simple but recognizable. \ No newline at end of file diff --git a/dataset/style_group_generation_abstract_0001/meta.json b/dataset/style_group_generation_abstract_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..a20ab4990ec2fdd207c5e37c4f3606abeb85670e --- /dev/null +++ b/dataset/style_group_generation_abstract_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group abstract style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0002", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0002/eval.json b/dataset/style_group_generation_anime_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e3da97c5c5ea8033f340353ec784a38ff184f86f --- /dev/null +++ b/dataset/style_group_generation_anime_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the animation style consistent across all images, including lines, shading, color palette, and overall aesthetic approach?", + "0_point_standard": "The images exhibit noticeable differences in style, disrupting the unified animated aesthetic.", + "1_point_standard": "The animation style is consistently applied, with cohesive lines, shading, color palette, and overall aesthetic across all images." + }, + { + "question": "Do the generated images align with the content and themes specified in the text description, accurately depicting the described characters, scenes, or objects?", + "0_point_standard": "The images deviate from the content or themes specified in the text description, lacking important details or interpretations.", + "1_point_standard": "The images accurately depict the themes and content described in the text, capturing the specified characters, scenes, or objects." + }, + { + "question": "Do the characters or main themes visually remain consistent across images, with recognizable features such as hairstyle, expression, and attire?", + "0_point_standard": "Character features or main themes differ across images, making it difficult to recognize them as the same entity.", + "1_point_standard": "Characters or main themes visually remain consistent, with recognizable features staying constant across images." + }, + { + "question": "Are animation-specific elements such as eyes, hair details, and expressions accurately rendered to match the animation style?", + "0_point_standard": "Animation-specific elements fail to accurately reflect the style, diminishing the authenticity of the images.", + "1_point_standard": "Eyes, hair, expressions, and other animation-specific elements are accurately rendered, enhancing the authenticity of the animation style." + }, + { + "question": "Do the backgrounds and environmental elements in each image match the animation style and harmonize with the character design?", + "0_point_standard": "Backgrounds or environments are inconsistent with the animation style or do not harmonize with character design.", + "1_point_standard": "Backgrounds and environments match the animation style and harmonize with the characters, creating cohesive scenes." + }, + { + "question": "Do the images exhibit high aesthetic quality, with clear details, pleasing composition, and a polished, professional finish?", + "0_point_standard": "The images lack aesthetic appeal or have poor detail quality, reducing their visual impact.", + "1_point_standard": "The images are visually appealing, with high-quality details, balanced composition, and a polished, professional finish." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0002/images.txt b/dataset/style_group_generation_anime_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_anime_0002/instruction.txt b/dataset/style_group_generation_anime_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..3212e2b6a0d052296560e4b6ab386bed63c3ef11 --- /dev/null +++ b/dataset/style_group_generation_anime_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 images portraying a spring school trip during cherry blossom season. All images must maintain a consistent Japanese 2D anime style throughout. The first image shows students in spring outfits playing under cherry blossom trees, with petals gently falling in the breeze; the second image features students picnicking by a river surrounded by cherry blossoms, with distant mountains visible; the third image depicts a student standing on a stone bridge gazing at the water below, where cherry blossoms are reflected; the fourth image shows students taking a group photo under the cherry blossom trees at dusk, with the sky painted in shades of orange by the setting sun. \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0002/meta.json b/dataset/style_group_generation_anime_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..c6a266ea8d46f6fa839c375a4348d6232f65034d --- /dev/null +++ b/dataset/style_group_generation_anime_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group anime style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0006", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0003/eval.json b/dataset/style_group_generation_anime_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..dce5698400b7b8007d489fadc77596353410847d --- /dev/null +++ b/dataset/style_group_generation_anime_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the animation style consistent across all images, including lines, shadows, color palette, and overall aesthetic approach?", + "0_point_standard": "The images show noticeable differences in style, disrupting the unified animation aesthetic.", + "1_point_standard": "The animation style is consistently applied, with cohesive lines, shadows, color palette, and overall aesthetic across all images." + }, + { + "question": "Do the generated images align with the content and themes specified in the text description, accurately depicting the described characters, scenes, or objects?", + "0_point_standard": "The images deviate from the content or themes specified in the text description, lacking important details or interpretations.", + "1_point_standard": "The images accurately depict the themes and content described in the text, capturing the specified characters, scenes, or objects." + }, + { + "question": "Are the characters or main themes visually consistent across the images, with recognizable features such as hairstyles, expressions, and clothing?", + "0_point_standard": "Characters' features or main themes differ between images, making it difficult to recognize them as the same entity.", + "1_point_standard": "The characters or main themes remain visually consistent, with recognizable features maintained across images." + }, + { + "question": "Are animation-specific elements such as eye, hair details, and expressions accurately rendered to conform to the animation style?", + "0_point_standard": "Animation-specific elements fail to accurately reflect the style, reducing the authenticity of the images.", + "1_point_standard": "Eyes, hair, expressions, and other animation-specific elements are accurately rendered, enhancing the authenticity of the animation style." + }, + { + "question": "Do the backgrounds and environmental elements in each image match the animation style and harmonize with the character design?", + "0_point_standard": "Backgrounds or environments are inconsistent with the animation style or not in harmony with the character design.", + "1_point_standard": "Backgrounds and environments are consistent with the animation style and harmonize with the characters, creating cohesive scenes." + }, + { + "question": "Do the images possess high aesthetic quality, with clear details, pleasing composition, and a polished, professional finish?", + "0_point_standard": "The images lack aesthetic appeal or have poor detail quality, reducing visual impact.", + "1_point_standard": "The images are visually appealing, with high-quality details, balanced composition, and a polished, professional finish." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0003/images.txt b/dataset/style_group_generation_anime_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_anime_0003/instruction.txt b/dataset/style_group_generation_anime_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0abfdc291c8d1c983937cb7918c7de50185caf08 --- /dev/null +++ b/dataset/style_group_generation_anime_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 6 images depicting a winter ski resort. All images must adhere to a consistent anime-style, reflecting a 2D Japanese anime aesthetic. The first image shows friends in ski gear standing atop a snowy mountain, looking down at the ski slopes; the second image features two students racing down the slopes, snow flying behind them, capturing the sense of speed; the third image shows a student who has fallen in the snow while other friends laugh and reach out to help, with snow-covered trees in the background; the fourth image is set inside a cozy café at the ski resort, where students are sipping hot drinks by a window overlooking the snowy landscape; the fifth image shows a student standing alone in the snow, gazing up at the sky as snowflakes gently fall; the sixth image depicts the ski slopes at night, illuminated by lights that make the snow sparkle. \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0003/meta.json b/dataset/style_group_generation_anime_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..dfe29fb702035af57490492978a163b9c71a7a1a --- /dev/null +++ b/dataset/style_group_generation_anime_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group anime style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0006", + "output_image_count": 6, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0004/eval.json b/dataset/style_group_generation_anime_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..f22208f76f2f58a7b50cb9696d83f4751692a272 --- /dev/null +++ b/dataset/style_group_generation_anime_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the animation style consistent across all images, including lines, shading, palette, and overall aesthetic approach?", + "0_point_standard": "There are noticeable differences in style among the images, disrupting the unified animation aesthetic.", + "1_point_standard": "The animation style is consistently applied, with cohesive lines, shading, palette, and overall aesthetic across all images." + }, + { + "question": "Do the generated images align with the content and theme specified in the text description, accurately depicting the described characters, scenes, or objects?", + "0_point_standard": "The images deviate from the content or theme specified in the text description, lacking important details or interpretations.", + "1_point_standard": "The images accurately depict the themes and content described in the text, capturing the specified characters, scenes, or objects." + }, + { + "question": "Do the characters or main themes visually remain consistent across the images, such as recognizable features like hairstyle, expressions, and clothing?", + "0_point_standard": "Character features or main themes differ between images, making it difficult to recognize them as the same entity.", + "1_point_standard": "The characters or main themes visually remain consistent, with recognizable features maintained across images." + }, + { + "question": "Are animation-specific elements like eye, hair details, and expressions accurately rendered to conform to the animation style?", + "0_point_standard": "Animation-specific elements fail to accurately reflect the style, reducing the authenticity of the images.", + "1_point_standard": "Eyes, hair, expressions, and other animation-specific elements are accurately rendered, enhancing the authenticity of the animation style." + }, + { + "question": "Do the background and environmental elements in each image match the animation style and harmonize with the character design?", + "0_point_standard": "Backgrounds or environments are inconsistent with the animation style or do not harmonize with the character design.", + "1_point_standard": "Backgrounds and environments are consistent with the animation style and harmonize with the characters, creating cohesive scenes." + }, + { + "question": "Do the images exhibit high aesthetic quality, with clear details, pleasing composition, and a refined, professional finish?", + "0_point_standard": "The images lack aesthetic appeal or suffer from poor detail quality, reducing their visual impact.", + "1_point_standard": "The images are visually appealing, with high-quality details, balanced composition, and a refined, professional finish." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0004/images.txt b/dataset/style_group_generation_anime_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_anime_0004/instruction.txt b/dataset/style_group_generation_anime_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..1084c57fae67fe878684aa09eafc3e6d6a2caf51 --- /dev/null +++ b/dataset/style_group_generation_anime_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 3 images showing a summer festival scene. All images must follow the same consistent Japanese 2D anime style. The first image features girls in yukatas taking a group photo in front of a shrine, with festival stalls lit by lanterns in the background; the second image shows a girl crouching by a pool trying to catch goldfish with a net, while the bustling crowd around her creates a lively atmosphere; the third image depicts a boy standing on the shrine steps, watching fireworks light up the night sky, with the shrine beautifully illuminated behind him. \ No newline at end of file diff --git a/dataset/style_group_generation_anime_0004/meta.json b/dataset/style_group_generation_anime_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3ad9e89aec1fe9e7645a0c9a6199f0825b9dc014 --- /dev/null +++ b/dataset/style_group_generation_anime_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group anime style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0006", + "output_image_count": 3, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/style_group_generation_creative_0001/eval.json b/dataset/style_group_generation_creative_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b43716864e22c557311f1507e0252eae20cc747a --- /dev/null +++ b/dataset/style_group_generation_creative_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the creative style consistently applied across all images, including elements like color schemes, design approach, and stylistic techniques?", + "0_point_standard": "There are noticeable differences in style among the images, disrupting the visual consistency of the series.", + "1_point_standard": "The creative style is consistently applied, with cohesive colors, design, and stylistic techniques across all images." + }, + { + "question": "Do the generated images align with the content and themes specified in the text description, accurately depicting the intended concept or theme?", + "0_point_standard": "The images deviate from the theme or subject described in the text, lacking key details or creative interpretation.", + "1_point_standard": "The images accurately reflect the theme and content described in the text, capturing the specified concept or theme according to the creative style." + }, + { + "question": "Do key elements (such as objects, characters, or patterns) maintain visual consistency across images, making them easily recognizable as part of the same creative series?", + "0_point_standard": "Key elements vary significantly between images, making it difficult to recognize them as part of the same series.", + "1_point_standard": "Key elements maintain visual consistency, retaining recognizable features across images to ensure continuity." + }, + { + "question": "Does each image offer a unique interpretation of the creative theme while staying true to the overall style?", + "0_point_standard": "The images lack diversity or appear repetitive, failing to explore unique aspects of the creative theme.", + "1_point_standard": "Each image provides a unique interpretation of the theme, adding diversity and interest within the specified creative style." + }, + { + "question": "Are design elements (such as shapes, textures, and composition) harmoniously consistent within each image and throughout the series, enhancing the creative style?", + "0_point_standard": "Design elements appear inconsistent or conflicting within and between images, weakening the cohesion of the creative style.", + "1_point_standard": "Design elements are effectively harmonized within each image and throughout the series, enhancing the consistency of the creative style." + }, + { + "question": "Do the images exhibit high aesthetic quality, with fine details, balanced composition, and a polished, professional appearance?", + "0_point_standard": "The images lack aesthetic appeal or have poor detail quality, diminishing their visual impact.", + "1_point_standard": "The images are visually appealing, with high-quality details, balanced composition, and a polished, professional finish, enhancing the creative style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_creative_0001/images.txt b/dataset/style_group_generation_creative_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_creative_0001/instruction.txt b/dataset/style_group_generation_creative_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..f238da66624dafcd30312d4b908c6ca217ec9a7f --- /dev/null +++ b/dataset/style_group_generation_creative_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 creative-style images, each showcasing a fusion of everyday objects and animals in a surreal scene. The first image features an owl combined with a clock, with the owl's body forming the clock face and its wings resembling the clock hands; the second image depicts a giraffe combined with a measuring tape, its elongated neck stretching out like a rolled tape; the third image shows a camel combined with an hourglass, where its hump is filled with flowing sand; the fourth image portrays a penguin combined with ice cubes, its translucent body containing frozen cubes. All images must maintain a consistent creative style. \ No newline at end of file diff --git a/dataset/style_group_generation_creative_0001/meta.json b/dataset/style_group_generation_creative_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..aed510793f4c017e4dd80b8a30fa07053776dc31 --- /dev/null +++ b/dataset/style_group_generation_creative_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group creative images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0005", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/style_group_generation_cthulhu_0001/eval.json b/dataset/style_group_generation_cthulhu_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5482fc31263f3fc1ec8093d75606e3681e2c3d4d --- /dev/null +++ b/dataset/style_group_generation_cthulhu_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the Cthulhu style consistently applied across all images, including elements like dark tones, grotesque atmosphere, and surreal monster designs?", + "0_point_standard": "There are noticeable differences in style across the images, disrupting the visual and thematic consistency of the Cthulhu series.", + "1_point_standard": "The Cthulhu style is consistently applied, with all images cohesively using dark tones, grotesque atmosphere, and surreal design elements." + }, + { + "question": "Do the generated images accurately express the themes and objects described in the text prompts, capturing the distinctive grotesque and mysterious qualities of the Cthulhu mythos?", + "0_point_standard": "The images deviate from the described themes or objects, failing to capture the intended dark or surreal qualities.", + "1_point_standard": "The images accurately reflect the described themes and objects, effectively conveying the grotesque and mysterious qualities of the Cthulhu mythos." + }, + { + "question": "Are recurring elements, such as creatures, symbols, or landscapes, visually consistent across the images, creating a unified Cthulhu series?", + "0_point_standard": "Key elements vary greatly between images, disrupting continuity and making them feel disconnected from one another.", + "1_point_standard": "Key elements, such as creatures, symbols, or landscapes, are consistently represented across the images, creating a cohesive visual narrative." + }, + { + "question": "Does each image effectively evoke the typical horror and unease of the Cthulhu mythos through memorable details, shadowy shapes, and a sense of foreboding?", + "0_point_standard": "The images lack a cohesive atmosphere or fail to evoke the unsettling feeling typical of Cthulhu-style artworks.", + "1_point_standard": "Each image successfully creates a memorable atmosphere, featuring unsettling details, shadowy shapes, and a sense of foreboding." + }, + { + "question": "Are textures and details (such as tentacles, symbols, and decayed structures) rendered in high quality and in keeping with the Cthulhu style?", + "0_point_standard": "Textures or details are poorly rendered or inconsistent, detracting from the dark, intricate appearance typical of Cthulhu artwork.", + "1_point_standard": "Textures and details are meticulously rendered, enhancing the dark and intricate appearance consistent with the Cthulhu theme." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, including balanced composition, refined details, and a cohesive, professional appearance?", + "0_point_standard": "The images lack aesthetic appeal or have low-quality details, reducing the overall impact of the series.", + "1_point_standard": "The images are visually appealing, with balanced composition, high-quality details, and a refined, professional finish, enhancing the Cthulhu theme." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_cthulhu_0001/images.txt b/dataset/style_group_generation_cthulhu_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_cthulhu_0001/instruction.txt b/dataset/style_group_generation_cthulhu_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..be22b45cb3719b079811142e4fd4cf683f49f77b --- /dev/null +++ b/dataset/style_group_generation_cthulhu_0001/instruction.txt @@ -0,0 +1 @@ +Please generate 4 Cthulhu-style images, all depicting a desolate seaside scene. The first image shows a solitary lighthouse, old and worn, with a faint, flickering light surrounded by mysterious mist. The second image depicts the rocky shore beneath the lighthouse, with strange symbols and floating seaweed on the rocks. The third image shows a massive tentacle stranded on the shore, seemingly from an unknown deep-sea creature. The fourth image reveals a shadowy figure in the sea, as if a giant creature is lurking underwater. Keep all images in the Cthulhu style. \ No newline at end of file diff --git a/dataset/style_group_generation_cthulhu_0001/meta.json b/dataset/style_group_generation_cthulhu_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..ba4beef1d7161237fc3a595d2e73ff000a9b0fe0 --- /dev/null +++ b/dataset/style_group_generation_cthulhu_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group cruthu style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0008", + "output_image_count": 4, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/style_group_generation_cyberpunk_0001/eval.json b/dataset/style_group_generation_cyberpunk_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..d0535d2fdccd4980a670c1fa46ae2696ce910399 --- /dev/null +++ b/dataset/style_group_generation_cyberpunk_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the cyberpunk style consistently applied across all images, including elements like neon lights, high-tech urban landscapes, and a futuristic dystopian aesthetic?", + "0_point_standard": "The images show noticeable stylistic differences that disrupt the visual consistency of the cyberpunk theme.", + "1_point_standard": "The cyberpunk style is consistently applied across all images, using coherent neon colors, urban elements, and futuristic aesthetics." + }, + { + "question": "Do the generated images accurately represent the themes and subjects described in the text prompt, capturing the core high-tech, low-life vibe of cyberpunk?", + "0_point_standard": "The images deviate from the described themes or subjects and fail to capture the intended futuristic and gritty texture.", + "1_point_standard": "The images accurately reflect the described themes and subjects, effectively conveying the high-tech, dystopian atmosphere characteristic of cyberpunk." + }, + { + "question": "Are the recurring elements such as neon lights, futuristic architecture, or tech devices visually consistent across images, creating a unified cyberpunk style series?", + "0_point_standard": "Key elements vary greatly across images, disrupting continuity and making them feel disjointed from one another.", + "1_point_standard": "Key elements, such as neon lights, skyscrapers, or tech devices, are consistently represented, creating a coherent cyberpunk-themed series." + }, + { + "question": "Does each image evoke a strong cyberpunk ambiance through immersive lighting, shadows, and color schemes, conveying a futuristic and gritty vibe?", + "0_point_standard": "The images lack a coherent ambiance or fail to evoke the dystopian feel typical of cyberpunk with neon lighting.", + "1_point_standard": "Each image successfully creates an immersive cyberpunk atmosphere, with neon lighting, shadow contrasts, and color schemes conveying a futuristic and gritty vibe." + }, + { + "question": "Are textures and details such as metallic surfaces, neon lights, and holographic elements rendered with high quality and in line with the cyberpunk style?", + "0_point_standard": "Textures or details are poorly or inconsistently rendered, diminishing the refined, high-tech appearance typical of cyberpunk art.", + "1_point_standard": "Textures and details are meticulously rendered, enhancing the refined, high-tech appearance consistent with the cyberpunk theme." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with balanced composition, refined details, and a coherent, professional look?", + "0_point_standard": "Images lack aesthetic appeal or have low-quality details, reducing the visual impact of the series.", + "1_point_standard": "Images are visually appealing, with balanced composition, high-quality details, and a refined, professional finish that enhances the cyberpunk theme." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_cyberpunk_0001/images.txt b/dataset/style_group_generation_cyberpunk_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_cyberpunk_0001/instruction.txt b/dataset/style_group_generation_cyberpunk_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..be71a6e2fee4ab765ba9fe74cd6d3d0e2cd557a2 --- /dev/null +++ b/dataset/style_group_generation_cyberpunk_0001/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 5 images depicting a high-tech but grim future city. The first image shows a female hacker in a black trench coat standing on the top of a skyscraper, with the neon-lit cityscape behind her; the second image shows a crowded street market, with people dressed in tech-enhanced clothing, and neon billboards flashing on both sides of the street; the third image is set in a dimly lit bar, where cyber warriors sit at the counter surrounded by flickering holograms; the fourth image shows the city's lower levels, filled with pipes and abandoned machinery, with mist hovering in the air; the fifth image depicts the city at night, with neon lights reflecting off the wet streets, while flying cars zip between skyscrapers in the distance. All images must follow the same consistent cyberpunk style. \ No newline at end of file diff --git a/dataset/style_group_generation_cyberpunk_0001/meta.json b/dataset/style_group_generation_cyberpunk_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..b46ddb252865658284d0c091339c1900f7ef596c --- /dev/null +++ b/dataset/style_group_generation_cyberpunk_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group cyberpunk style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0007", + "output_image_count": 5, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0002/eval.json b/dataset/style_group_generation_european_and_american_comic_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..64ead526939b9039bdf8aaab5232efb4345ee521 --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Are all the images consistently applying the Western comic style, including elements like bold outlines, vibrant colors, and dynamic compositions?", + "0_point_standard": "The styles of the images differ significantly, disrupting the visual consistency of the Western comic theme.", + "1_point_standard": "All images consistently apply the Western comic style with unified use of bold outlines, vibrant colors, and stylized features." + }, + { + "question": "Do the generated images accurately represent the themes, scenes, or character types described in the text prompts, capturing the distinctive qualities typical of Western comics?", + "0_point_standard": "The images deviate from the described themes or character types and lack key stylistic or thematic elements.", + "1_point_standard": "The images accurately reflect the themes and elements described in the text, capturing the expected traits of the Western comic style." + }, + { + "question": "Are repeated visual elements such as costumes, props, or background details consistent across all images to maintain a cohesive aesthetic?", + "0_point_standard": "Visual elements like costumes or props vary significantly between images, making them feel disconnected.", + "1_point_standard": "Visual elements are consistent, ensuring a cohesive aesthetic throughout the series." + }, + { + "question": "Are the expressions and poses of the characters expressive and dynamic, enhancing the typical vibrancy and visual appeal of Western comics?", + "0_point_standard": "Expressions or poses appear stiff or lack dynamism, reducing the images' visual impact.", + "1_point_standard": "Character expressions are rich, and poses are full of energy, enhancing the typical vibrancy of Western comics." + }, + { + "question": "Are the line work, coloring, and shading consistent and well-executed, with bold outlines, vibrant colors, and stylized shading appropriate for Western comics?", + "0_point_standard": "Line work, coloring, or shading is poorly executed or inconsistent, reducing the comic's visual appeal.", + "1_point_standard": "Line work, coloring, and shading are well-executed and consistent across all images, enhancing an authentic comic appearance." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with detailed refinement, balanced composition, and a cohesive, professional appearance?", + "0_point_standard": "Images lack aesthetic appeal or appear unfinished, reducing the visual impact of the series.", + "1_point_standard": "Images are visually appealing with balanced composition, high-quality detail, and a refined, professional finish, enhancing the Western comic theme." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0002/images.txt b/dataset/style_group_generation_european_and_american_comic_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_european_and_american_comic_0002/instruction.txt b/dataset/style_group_generation_european_and_american_comic_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..eb23bf2f98a9b73c0d6e4774b1addfb86980deb3 --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 6 images depicting an adventurer entering an ancient jungle temple in search of treasure, with all images following the same Western comic book style. The first image shows the adventurer walking through a vine-covered jungle, with the silhouette of the temple looming in the mist; the second image depicts the adventurer entering the temple's gate, guarded by stone pillars carved with mysterious runes; the third image shows the adventurer avoiding a spike trap descending from the ceiling; the fourth image shows the adventurer discovering the ancient treasure in the temple's center, the gold gleaming with a mysterious light; the fifth image depicts the temple's guardian awakening, as a giant stone statue comes to life; the sixth image shows the adventurer escaping as the temple begins to collapse behind him. \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0002/meta.json b/dataset/style_group_generation_european_and_american_comic_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..ea1ffc8d84a158522bc743236366eb34ee8eff6e --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group european and american comic style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0001", + "output_image_count": 6, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0003/eval.json b/dataset/style_group_generation_european_and_american_comic_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..6bde0a5d097e71658eef77871e719b4d417a2002 --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Do all the images consistently apply the Western comic style, including elements like bold lines, vibrant colors, and dynamic poses?", + "0_point_standard": "The image styles vary greatly, disrupting the visual consistency of the Western comic theme.", + "1_point_standard": "All images consistently apply the Western comic style, utilizing bold lines, vibrant colors, and stylized features uniformly." + }, + { + "question": "Do the generated images accurately reflect the theme, scene, or character types described in the text prompt, capturing the distinctive qualities typical of Western comics?", + "0_point_standard": "The images deviate from the described theme or character types, lacking key stylistic or thematic elements.", + "1_point_standard": "The images accurately reflect the themes and elements described in the text, capturing the expected traits of Western comic style." + }, + { + "question": "Are recurring visual elements such as costumes, props, or background details consistent across all images to maintain a unified aesthetic?", + "0_point_standard": "Visual elements like costumes or props vary greatly between images, making them feel disconnected.", + "1_point_standard": "Visual elements are consistently portrayed, ensuring a cohesive aesthetic across the series." + }, + { + "question": "Are the characters' expressions and poses expressive and dynamic, enhancing the typical vibrancy and visual appeal of Western comics?", + "0_point_standard": "Expressions or poses appear stiff or lack dynamism, reducing the visual impact of the images.", + "1_point_standard": "Characters' facial expressions are rich and poses are dynamic, enhancing the typical vibrancy of Western comics." + }, + { + "question": "Are the line work, coloring, and shading consistent and well-executed, with bold outlines, vibrant colors, and stylized shading suitable for Western comics?", + "0_point_standard": "Line work, coloring, or shading is poorly executed or inconsistent, diminishing the visual appeal of the comic.", + "1_point_standard": "Line work, coloring, and shading are well-executed and consistent across all images, enhancing an authentic comic look." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with refined details, balanced composition, and a cohesive, professional appearance?", + "0_point_standard": "The images lack aesthetic quality or appear unfinished, reducing the visual impact of the series.", + "1_point_standard": "The images are visually appealing, with balanced composition, high-quality details, and a refined, professional finish, enhancing the Western comic theme." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0003/images.txt b/dataset/style_group_generation_european_and_american_comic_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_european_and_american_comic_0003/instruction.txt b/dataset/style_group_generation_european_and_american_comic_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..35c8f0b5b4ebf086ddc741d50dafb51e40089ce2 --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 images depicting a space battle in a futuristic galaxy, with all images following the same Western comic book style. The first image shows a fleet of spaceships flying between planets, with starlight reflecting off their hulls; the second image depicts a chaotic space battle, with laser beams crisscrossing between warships and explosions lighting up the scene; the third image shows the interior of a spaceship, with the commander directing the battle as holographic screens display the war's progress; the fourth image depicts the aftermath, with the ships landing on a mothership, and injured pilots being treated by medical robots. \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0003/meta.json b/dataset/style_group_generation_european_and_american_comic_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..fcb213c77b02950c4fcb426295f2b543066b1cea --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group european and american comic style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0001", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0004/eval.json b/dataset/style_group_generation_european_and_american_comic_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b6e05200424f613235b01f3ace0c679f5834a79d --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Have all the images consistently applied the Western comic style, including elements like bold outlines, vibrant colors, and dynamic poses?", + "0_point_standard": "The image styles vary greatly, disrupting the visual consistency of the Western comic theme.", + "1_point_standard": "All images have consistently applied the Western comic style, using bold outlines, vibrant colors, and stylized features uniformly." + }, + { + "question": "Do the generated images accurately portray the themes, scenes, or character types described in the text prompt, capturing the distinctive qualities typical of Western comics?", + "0_point_standard": "The images deviate from the described themes or character types, lacking essential stylistic or thematic elements.", + "1_point_standard": "The images accurately reflect the themes and elements described in the text, capturing the expected qualities of the Western comic style." + }, + { + "question": "Are recurring visual elements such as clothing, props, or background details consistent across all images to maintain a unified aesthetic?", + "0_point_standard": "Visual elements like clothing or props vary greatly between images, making them feel disconnected.", + "1_point_standard": "Visual elements are consistent, ensuring a cohesive aesthetic throughout the series." + }, + { + "question": "Are the expressions and poses of characters expressive and dynamic, enhancing the vitality and visual appeal typical of Western comics?", + "0_point_standard": "Expressions or poses appear stiff or lack dynamism, reducing the visual impact of the images.", + "1_point_standard": "Character facial expressions are rich, and poses are dynamic, enhancing the vitality typical of Western comics." + }, + { + "question": "Are the outlines, coloring, and shading consistent and well-executed, with bold contours, vibrant colors, and stylized shading suitable for Western comics?", + "0_point_standard": "Outlines, coloring, or shading are poorly executed or inconsistent, reducing the comic's visual appeal.", + "1_point_standard": "Outlines, coloring, and shading are well-executed and consistent across all images, enhancing the authentic comic appearance." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with refined details, balanced composition, and a cohesive, professional appearance?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, reducing the series' visual impact.", + "1_point_standard": "The images are visually appealing, with balanced composition, high-quality details, and a refined, professional finish, enhancing the Western comic theme." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0004/images.txt b/dataset/style_group_generation_european_and_american_comic_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_european_and_american_comic_0004/instruction.txt b/dataset/style_group_generation_european_and_american_comic_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..bd23c662b596b328ac25c76310d0a9ce42b23666 --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 3 images depicting a knight's battle with a dragon in a fantasy kingdom, with all images following the same Western comic book style. The first image shows the knight encountering the giant dragon in a valley, with the dragon's fiery breath illuminating the scene; the second image shows the knight charging at the dragon's claws with a gleaming sword, while the forest burns in the background; the third image depicts the knight standing victorious on a hilltop as the dragon's silhouette fades into the distant sky, the knight raising his sword in triumph. \ No newline at end of file diff --git a/dataset/style_group_generation_european_and_american_comic_0004/meta.json b/dataset/style_group_generation_european_and_american_comic_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..0d891b5a92bf147707694476803dd6dc14c8088f --- /dev/null +++ b/dataset/style_group_generation_european_and_american_comic_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group european and american comic style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0001", + "output_image_count": 3, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/style_group_generation_pixel_art_0002/eval.json b/dataset/style_group_generation_pixel_art_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..52485449097506d6bcd6936265bd03de4a19c320 --- /dev/null +++ b/dataset/style_group_generation_pixel_art_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the pixel art style consistently applied across all images, including pixel size, resolution, and the typical simplicity of pixel art?", + "0_point_standard": "There are significant differences in pixel size or style among the images, disrupting the visual consistency of the pixel art theme.", + "1_point_standard": "The pixel art style is consistently applied, with cohesive pixel size, resolution, and style simplicity across all images." + }, + { + "question": "Do the generated images accurately express the themes, objects, or characters described in the text prompts within the constraints of pixel art?", + "0_point_standard": "The images deviate from the described themes or objects, lacking key elements or stylistic details.", + "1_point_standard": "The images accurately reflect the described themes and objects, capturing the intended features in the pixel art style." + }, + { + "question": "Is the palette and tonal consistency maintained across all images, using a limited and harmonious color scheme typical of pixel art?", + "0_point_standard": "There is significant variation in the palette or tones, leading to a discordant appearance across the images.", + "1_point_standard": "The palette and tones are consistently applied, with a cohesive and limited color scheme that aligns with pixel art aesthetics." + }, + { + "question": "Are the objects or elements within each image clearly presented, with sufficient detail and recognizable shapes within the constraints of pixel art?", + "0_point_standard": "Objects or elements are unclear or unrecognizable, lacking necessary detail for identification.", + "1_point_standard": "Each object or element is clearly presented, with recognizable shapes and sufficient detail to be distinctly visible within the pixel constraints." + }, + { + "question": "Does each image effectively use shading techniques such as dithering or contrast to create depth and dimension without overly complicating the pixel art style?", + "0_point_standard": "Shading and highlights are poorly executed or overly complex, compromising the simplicity and clarity of pixel art.", + "1_point_standard": "Shading and highlights are effectively applied, using techniques like dithering and contrast to add depth while maintaining pixel art aesthetics." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with refined details, consistent pixel precision, and a cohesive, professional appearance?", + "0_point_standard": "The images lack aesthetic appeal, exhibit inconsistencies, or appear unfinished, diminishing the overall effect of the series.", + "1_point_standard": "The images are visually appealing, with refined details, consistent pixel precision, and a professional finish that enhances the pixel art style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_pixel_art_0002/images.txt b/dataset/style_group_generation_pixel_art_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_pixel_art_0002/instruction.txt b/dataset/style_group_generation_pixel_art_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..73977cb1d2508d93a8670958a7bf31f35f64b76c --- /dev/null +++ b/dataset/style_group_generation_pixel_art_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 pixel art-style images depicting an ancient forest. The first image shows a path winding through tall pixelated trees, with sunlight filtering through the leaves and casting a peaceful glow; the second image is of a calm lake in the forest, its surface reflecting the pixelated scenery around it; the third image shows a ruined pixelated temple, overgrown with vines, and filled with a sense of ancient mystery; the fourth image depicts the forest at night, with fireflies glowing softly in the dark, and the silhouettes of trees under the moonlight. \ No newline at end of file diff --git a/dataset/style_group_generation_pixel_art_0002/meta.json b/dataset/style_group_generation_pixel_art_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..550ce8f611ddbfbd7323fcf72d940b25758da406 --- /dev/null +++ b/dataset/style_group_generation_pixel_art_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group pixel art style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0004", + "output_image_count": 4, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0002/eval.json b/dataset/style_group_generation_sketch_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..8030759e3ded2ee831572b76ba3c26defd9d1558 --- /dev/null +++ b/dataset/style_group_generation_sketch_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the sketch style consistent across all images, including elements such as line quality, shading techniques, and pencil texture?", + "0_point_standard": "There are noticeable differences in style among the images, disrupting the visual consistency of the sketch theme.", + "1_point_standard": "The sketch style is consistently applied, with cohesive line quality, shading, and pencil texture across all images." + }, + { + "question": "Do the generated images accurately represent the themes, objects, or subjects described in the text prompt, capturing the intended qualities of sketch art?", + "0_point_standard": "The images deviate from the described themes or subjects, lacking key elements or stylistic details.", + "1_point_standard": "The images accurately reflect the described themes and subjects, capturing the intended characteristics in sketch style." + }, + { + "question": "Is the thickness of lines and brushstrokes consistent across all images, creating a unified aesthetic in line with traditional sketch techniques?", + "0_point_standard": "There is considerable variation in line and brushstroke thickness, resulting in a disjointed appearance among the images.", + "1_point_standard": "The thickness of lines and brushstrokes is consistent, creating a cohesive visual effect across all images in the series." + }, + { + "question": "Are shading techniques, such as hatching, cross-hatching, or blending, realistically and effectively applied to create depth and dimension?", + "0_point_standard": "The shading is unrealistic or inconsistent, lacking depth and failing to replicate sketch techniques.", + "1_point_standard": "Shading is effectively applied, with realistic techniques adding depth and enhancing the sketch quality." + }, + { + "question": "Does each image maintain the typical paper-like texture and tactile feel of pencil sketches?", + "0_point_standard": "The images lack a paper-like or textured feel, diminishing the authenticity of the sketch style.", + "1_point_standard": "The images exhibit a paper-like texture, adding tactility and enhancing the authenticity of the sketch style." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with refined details, balanced composition, and a professional finish that enhances the sketch style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, reducing the visual impact of the series.", + "1_point_standard": "The images are visually appealing, with refined details, balanced composition, and a professional finish that enhances the sketch style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0002/images.txt b/dataset/style_group_generation_sketch_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_sketch_0002/instruction.txt b/dataset/style_group_generation_sketch_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..29fcfa18076e93f651caea6a42771b8ef241f7e8 --- /dev/null +++ b/dataset/style_group_generation_sketch_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 6 sketch-style images depicting a classical European city. The first image is of a city square with a statue in the center, surrounded by ancient stone buildings; the second image shows the exterior of a grand cathedral, its spire reaching towards the sky with pigeons flying around; the third image depicts a narrow alley with a café, where a few patrons are enjoying coffee at outdoor tables, with bustling streets in the distance; the fourth image shows a bridge over a river, with reflections of the buildings on the water's surface; the fifth image shows the city at night, with street lamps casting light onto cobblestone streets as people walk by; the sixth image provides a distant panoramic view of the city, with red-tiled roofs and mountain ranges in the background. \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0002/meta.json b/dataset/style_group_generation_sketch_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..93e1af70a3b3433fd5f470b216c1b798da83dd24 --- /dev/null +++ b/dataset/style_group_generation_sketch_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group sketch style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0009", + "output_image_count": 6, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0003/eval.json b/dataset/style_group_generation_sketch_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4ca94628ecc90657cb1d8cbff6a63763da86b030 --- /dev/null +++ b/dataset/style_group_generation_sketch_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the sketch style consistent across all images, including elements like line quality, shading techniques, and pencil texture?", + "0_point_standard": "There are noticeable differences in style among the images, disrupting the visual consistency of the sketch theme.", + "1_point_standard": "The sketch style is consistently applied, with cohesive line quality, shading, and pencil texture across all images." + }, + { + "question": "Do the generated images accurately represent the theme, objects, or subjects described in the text prompt, capturing the intended characteristics of sketch art?", + "0_point_standard": "The images deviate from the described theme or subject, lacking key elements or stylistic details.", + "1_point_standard": "The images accurately reflect the described theme and subject, capturing the intended characteristics in a sketch style." + }, + { + "question": "Is the consistency of line and stroke thickness maintained across all images, creating a unified aesthetic in line with traditional sketch techniques?", + "0_point_standard": "There is significant variation in line and stroke thickness, leading to a disjointed appearance between images.", + "1_point_standard": "Line and stroke thickness are consistent, creating a cohesive visual effect across all images in the series." + }, + { + "question": "Are shading techniques like hatching, cross-hatching, or blending realistically and effectively employed to create depth and dimensionality?", + "0_point_standard": "Shading is unrealistic or inconsistent, lacking depth and failing to replicate sketch techniques.", + "1_point_standard": "Shading is effectively employed, with realistic techniques adding depth and enhancing sketch quality." + }, + { + "question": "Does each image maintain the typical paper-like texture and tactile feel of a pencil sketch?", + "0_point_standard": "The images lack a paper-like or textured feel, diminishing the authenticity of the sketch style.", + "1_point_standard": "The images exhibit a paper-like texture, adding tactile quality and enhancing the realism of the sketch style." + }, + { + "question": "Do the images display a high level of aesthetic quality, with refined details, balanced composition, and professional appearance that enhance the sketch style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, reducing the visual impact of the series.", + "1_point_standard": "The images are visually appealing, with refined details, balanced composition, and a professional finish that enhances the sketch style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0003/images.txt b/dataset/style_group_generation_sketch_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_sketch_0003/instruction.txt b/dataset/style_group_generation_sketch_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..925610ab69a9a47c6e5413165725722817422372 --- /dev/null +++ b/dataset/style_group_generation_sketch_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 sketch-style images depicting a busy marketplace. The first image shows the entrance to the market, where vendors are setting up their goods, and the streets are filled with people; the second image depicts the stalls filled with neatly arranged fruits and vegetables, while customers browse and select their purchases; the third image shows a street performer playing an instrument, surrounded by a crowd of onlookers; the fourth image depicts the market winding down at sunset, with vendors packing up their goods as the warm light casts long shadows on the stalls. \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0003/meta.json b/dataset/style_group_generation_sketch_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..855de88f5f6e361a0a26441757675c4ace1e67c9 --- /dev/null +++ b/dataset/style_group_generation_sketch_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group sketch style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0009", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0004/eval.json b/dataset/style_group_generation_sketch_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..5af1b2ee55f7b1337e081ab007c92b8cee0e1241 --- /dev/null +++ b/dataset/style_group_generation_sketch_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Do all images have a consistent sketch style, including elements such as line quality, shading techniques, and pencil texture?", + "0_point_standard": "There are noticeable differences in style among the images, disrupting the visual consistency of the sketch theme.", + "1_point_standard": "The sketch style is consistently applied, with cohesive line quality, shading, and pencil texture across all images." + }, + { + "question": "Do the generated images accurately represent the theme, objects, or subjects described in the text prompt, capturing the intended characteristics of sketch art?", + "0_point_standard": "The images deviate from the described theme or subject, lacking key elements or stylistic details.", + "1_point_standard": "The images accurately reflect the described theme and subjects, capturing the intended characteristics in sketch style." + }, + { + "question": "Is the thickness of lines and strokes consistent across all images, creating a unified aesthetic that aligns with traditional sketch techniques?", + "0_point_standard": "There are significant variations in line and stroke thickness, leading to a disjointed appearance among the images.", + "1_point_standard": "Line and stroke thickness are consistent, creating a cohesive visual effect across all images in the series." + }, + { + "question": "Are shading techniques such as hatching, cross-hatching, or blending realistically and effectively applied to create depth and dimensionality?", + "0_point_standard": "The shading is unrealistic or inconsistent, lacking depth and failing to replicate sketch techniques.", + "1_point_standard": "Shading is effectively applied, with realistic techniques that add depth and enhance the sketch quality." + }, + { + "question": "Does each image maintain the paper-like texture and feel typical of pencil sketches?", + "0_point_standard": "The images lack a paper-like or textured feel, diminishing the authenticity of the sketch style.", + "1_point_standard": "The images exhibit a paper-like texture, adding tactile quality and enhancing the authenticity of the sketch style." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with refined details, balanced composition, and a professional finish that enhances the sketch style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, reducing the visual impact of the series.", + "1_point_standard": "The images are visually appealing, with refined details, balanced composition, and a professional finish that enhances the sketch style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0004/images.txt b/dataset/style_group_generation_sketch_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_sketch_0004/instruction.txt b/dataset/style_group_generation_sketch_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..b1dacefbcd971745101c04c3d77fa7bd6b779fbd --- /dev/null +++ b/dataset/style_group_generation_sketch_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 5 sketch-style images depicting the sea and coastline. The first image shows a tall lighthouse standing by the shore, with waves crashing against the rocks in the background; the second image depicts a fishing boat sailing on the sea, with the crew busily working, and the wind blowing through the sails; the third image shows distant rocks, where waves are splashing high against the stone; the fourth image depicts a small fishing village by the sea, with rooftops blending into the horizon and the vast ocean beyond; the fifth image shows a sunset over the sea, with the sky glowing golden and fishing boats slowly returning to the harbor. \ No newline at end of file diff --git a/dataset/style_group_generation_sketch_0004/meta.json b/dataset/style_group_generation_sketch_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..26caba6d628364f766f01af93c0628790e394353 --- /dev/null +++ b/dataset/style_group_generation_sketch_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group sketch style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0009", + "output_image_count": 5, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0002/eval.json b/dataset/style_group_generation_woodcut_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b11e3422bc5af5872941f4f683bbe02d0133031d --- /dev/null +++ b/dataset/style_group_generation_woodcut_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Have all the images consistently applied the woodcut style, including prominent features such as bold lines, carved textures, and the typical high contrast of woodblock prints?", + "0_point_standard": "The images display noticeable variations in style, disrupting the visual consistency of the woodcut theme.", + "1_point_standard": "The woodcut style is consistently applied, with all images uniformly using bold lines, textures, and contrast." + }, + { + "question": "Do the generated images accurately represent the themes, subjects, or characters described in the text prompts, capturing the expected qualities of woodcut art?", + "0_point_standard": "The images deviate from the described themes or subjects, lacking key stylistic or thematic details.", + "1_point_standard": "The images accurately reflect the described themes and subjects, capturing the expected qualities of woodcut art." + }, + { + "question": "Is the quality of lines and engraving patterns (e.g., bold outlines, cross-hatching, or parallel lines) consistent across all images, creating the typical unified aesthetic of woodcut art?", + "0_point_standard": "There is significant variation in line quality and engraving patterns, resulting in a discordant appearance between images.", + "1_point_standard": "Line quality and engraving patterns are consistent, enhancing the overall look and feel of the woodcut series." + }, + { + "question": "Do the images effectively utilize high contrast and negative space to create depth and clarity, similar to what is seen in traditional woodblock prints?", + "0_point_standard": "The use of contrast and negative space is inconsistent or ineffective, resulting in a lack of depth or clarity.", + "1_point_standard": "The use of contrast and negative space is effective, adding depth, clarity, and a dynamic visual quality to the images." + }, + { + "question": "Does each image maintain an engraved appearance with texture and tactile quality similar to woodcut prints?", + "0_point_standard": "The images lack a texture similar to woodcuts, undermining the authenticity of the style.", + "1_point_standard": "Each image features texture and an engraved appearance, adding tactility and enhancing the authenticity of the woodcut style." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with fine details, balanced composition, and a professional finish that enhances the woodcut style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, diminishing the overall effect of the series.", + "1_point_standard": "The images are visually appealing, with fine details, balanced composition, and a professional finish that enhances the woodcut style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0002/images.txt b/dataset/style_group_generation_woodcut_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_woodcut_0002/instruction.txt b/dataset/style_group_generation_woodcut_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..8bbbf3a774245f171eeabeda4c40e175f3ac2b1d --- /dev/null +++ b/dataset/style_group_generation_woodcut_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 6 woodcut-style images depicting a bustling fishing village. The first image shows fishermen rowing their boats out to sea at sunrise, with the light reflecting off the rippling water; the second image is of the village pier, where boats are docked and fishermen are repairing their nets; the third image depicts the village market, with stalls filled with freshly caught seafood, and villagers busily trading; the fourth image shows the village houses, with seaweed-covered roofs and elders chatting in front of their homes; the fifth image shows fishing boats returning to the harbor at sunset, with the sky ablaze in red and orange hues; the sixth image depicts the village at night, with waves gently lapping against the shore, and the village lights reflecting on the calm sea. \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0002/meta.json b/dataset/style_group_generation_woodcut_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..1816ddd36f7da7ff542c5382a6ba9190c33defd6 --- /dev/null +++ b/dataset/style_group_generation_woodcut_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group woodcut style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0010", + "output_image_count": 6, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0003/eval.json b/dataset/style_group_generation_woodcut_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c8fc3d281205da28df3ab3af07cd4a2b1744ba0b --- /dev/null +++ b/dataset/style_group_generation_woodcut_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Are all images consistently applying the woodcut style, including features such as bold lines, carved textures, and the high contrast typical of woodblock prints?", + "0_point_standard": "The images display noticeable style variations that disrupt the visual consistency of the woodcut theme.", + "1_point_standard": "The woodcut style is applied consistently, with all images using bold lines, textures, and contrast uniformly." + }, + { + "question": "Do the generated images accurately represent the themes, subjects, or figures described in the text prompts, capturing the expected qualities of woodcut art?", + "0_point_standard": "The images deviate from the described themes or subjects, lacking key stylistic or thematic details.", + "1_point_standard": "The images accurately reflect the described themes and subjects, capturing the expected qualities of woodcut art." + }, + { + "question": "Is the quality of lines and engraving patterns (e.g., bold outlines, cross-hatching, or parallel lines) consistent across all images, creating the unified aesthetic typical of woodcut art?", + "0_point_standard": "There is significant variance in line quality and engraving patterns, leading to a discordant appearance among images.", + "1_point_standard": "Line quality and engraving patterns are consistent, enhancing the overall look and feel of the woodcut series." + }, + { + "question": "Do the images effectively use high contrast and negative space to create depth and clarity, similar to what is seen in traditional woodblock prints?", + "0_point_standard": "The use of contrast and negative space is inconsistent or ineffective, resulting in a lack of depth or clarity.", + "1_point_standard": "Contrast and negative space are used effectively, adding depth, clarity, and dynamic visual quality to the images." + }, + { + "question": "Does each image maintain an engraved appearance, with textures and tactile qualities akin to woodblock prints?", + "0_point_standard": "The images lack woodcut-like texture and tactility, diminishing the authenticity of the style.", + "1_point_standard": "Each image has textures and an engraved appearance, adding tactility and enhancing the authenticity of the woodcut style." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with fine details, balanced composition, and professional finish, thereby enhancing the woodcut style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, reducing the overall impact of the series.", + "1_point_standard": "The images are visually appealing, with fine details, balanced composition, and professional finish, thereby enhancing the woodcut style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0003/images.txt b/dataset/style_group_generation_woodcut_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_woodcut_0003/instruction.txt b/dataset/style_group_generation_woodcut_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..e4f23b78d5b2ce56fa2b6ed435c2fccb293985e0 --- /dev/null +++ b/dataset/style_group_generation_woodcut_0003/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 4 woodcut-style images depicting a vast grassland. The first image shows a herd of horses galloping across the open plain, with distant mountain ranges and low-hanging clouds in the background; the second image depicts a yurt on the grassland, where herders are gathered for a meal, with saddles and water jugs nearby; the third image shows a sunrise over the grassland, with sunlight breaking through the clouds and grazing cattle scattered across the endless green expanse; the fourth image is set at night, with a starry sky above and campfires illuminating the tents in the herders' camp. \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0003/meta.json b/dataset/style_group_generation_woodcut_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..603b8d2fc8f7ebf06c4bd50755bb46810b3fe189 --- /dev/null +++ b/dataset/style_group_generation_woodcut_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group woodcut style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0010", + "output_image_count": 4, + "case_id": "0003" +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0004/eval.json b/dataset/style_group_generation_woodcut_0004/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e7a9c40505b00bf2232d70dab97f90399f59c6e5 --- /dev/null +++ b/dataset/style_group_generation_woodcut_0004/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Are all images consistently applying the woodcut style, including features such as bold lines, engraved textures, and the high contrast typical of woodcuts?", + "0_point_standard": "The images show significant stylistic variations that disrupt the visual consistency of the woodcut theme.", + "1_point_standard": "The woodcut style is consistently applied, with all images uniformly using bold lines, textures, and contrast." + }, + { + "question": "Do the generated images accurately reflect the themes, subjects, or figures described in the text prompts, capturing the intended qualities of woodcut art?", + "0_point_standard": "The images deviate from the described themes or subjects, lacking key stylistic or thematic details.", + "1_point_standard": "The images accurately reflect the described themes and subjects, capturing the intended qualities of woodcut art." + }, + { + "question": "Is the line quality and engraving pattern (e.g., bold outlines, cross-hatching, or parallel lines) consistent across all images, creating a unified aesthetic typical of woodcut art?", + "0_point_standard": "There is significant variation in line quality and engraving patterns, resulting in a discordant appearance between images.", + "1_point_standard": "Line quality and engraving patterns are consistent, enhancing the overall look and feel of the woodcut series." + }, + { + "question": "Do the images effectively use high contrast and negative space to create depth and clarity, as seen in traditional woodcut prints?", + "0_point_standard": "The use of contrast and negative space is inconsistent or ineffective, leading to a lack of depth or clarity.", + "1_point_standard": "The use of contrast and negative space is effective, adding depth, clarity, and dynamic visual quality to the images." + }, + { + "question": "Does each image maintain an engraved appearance with textures and tactile qualities similar to those of a woodcut print?", + "0_point_standard": "The images lack the textural quality reminiscent of woodcuts, undermining the authenticity of the style.", + "1_point_standard": "Each image features textures and an engraved appearance, adding tactile qualities and enhancing the authenticity of the woodcut style." + }, + { + "question": "Do the images exhibit a high level of aesthetic quality, with fine details, balanced composition, and a professional finish that enhances the woodcut style?", + "0_point_standard": "The images lack aesthetic appeal or appear unfinished, diminishing the overall impact of the series.", + "1_point_standard": "The images are visually appealing, with fine details, balanced composition, and a professional finish that enhances the woodcut style." + } + ] +} \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0004/images.txt b/dataset/style_group_generation_woodcut_0004/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/style_group_generation_woodcut_0004/instruction.txt b/dataset/style_group_generation_woodcut_0004/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..5a180969630749af5d33160ed476ffece04f566c --- /dev/null +++ b/dataset/style_group_generation_woodcut_0004/instruction.txt @@ -0,0 +1 @@ +Please generate a set of 3 woodcut-style images depicting a medieval European castle. The first image shows the castle's grand entrance, with massive stone walls and a moat surrounding the fortress, with mountains and forests in the distance; the second image depicts the castle's inner courtyard, where knights are training under the warm sunlight on stone-paved grounds; the third image shows the castle's high tower, with a flag billowing in the wind, and the distant outline of a village visible on the horizon. \ No newline at end of file diff --git a/dataset/style_group_generation_woodcut_0004/meta.json b/dataset/style_group_generation_woodcut_0004/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..c5c5621182b58fdf5ce9a0a2a3dbc30a1a5f7bae --- /dev/null +++ b/dataset/style_group_generation_woodcut_0004/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "group woodcut style images generation", + "num_of_cases": 4, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0010", + "output_image_count": 3, + "case_id": "0004" +} \ No newline at end of file diff --git a/dataset/text_editing_text_insertion_0002/eval.json b/dataset/text_editing_text_insertion_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..bc3607028ea0b80d97b179f6fc12d377760edb63 --- /dev/null +++ b/dataset/text_editing_text_insertion_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Is the inserted text positioned at the specified location in the text description?", + "0_point_standard": "The text insertion location differs from the specified location in the text description.", + "1_point_standard": "The text is precisely inserted at the specified location in the text description." + }, + { + "question": "Aside from the specified text insertion, does the rest of the image remain unchanged and consistent with the original image?", + "0_point_standard": "There are noticeable changes or distortions in parts of the image that were not intended to be modified.", + "1_point_standard": "The rest of the image remains unchanged and consistent with the original image." + }, + { + "question": "Does the inserted text match the specified font, size, and style in the text description?", + "0_point_standard": "The inserted text does not match the specified font, size, or style.", + "1_point_standard": "The inserted text perfectly matches the specified font, size, and style in the description." + }, + { + "question": "Is the content of the inserted text accurate and fully consistent with the given text description?", + "0_point_standard": "The text content is incorrect with errors or omissions.", + "1_point_standard": "The text content is accurate and fully consistent with the given text description." + }, + { + "question": "Does the inserted text blend seamlessly into the image, appearing natural and maintaining the overall aesthetic of the image?", + "0_point_standard": "The text insertion looks fake or disjointed, disrupting the aesthetic of the image.", + "1_point_standard": "The text blends seamlessly, maintaining the overall aesthetic and visual harmony of the image." + }, + { + "question": "Does the final image (including text insertion) show high-quality editing without any visible flaws or errors?", + "0_point_standard": "The image shows visible editing errors or flaws, reducing its quality.", + "1_point_standard": "The image displays high-quality editing without any visible flaws or errors, maintaining a professional finish." + } + ] +} \ No newline at end of file diff --git a/dataset/text_editing_text_insertion_0002/images.txt b/dataset/text_editing_text_insertion_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..f78a0a487889440194c012bb52da0aea86ca2431 --- /dev/null +++ b/dataset/text_editing_text_insertion_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN0195sbob1KK07hO1A1z_!!6000000001144-0-tps-6372-3584.jpg diff --git a/dataset/text_editing_text_insertion_0002/instruction.txt b/dataset/text_editing_text_insertion_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..d25583941cda44ac53b2773d2d7277bdee3942bc --- /dev/null +++ b/dataset/text_editing_text_insertion_0002/instruction.txt @@ -0,0 +1 @@ +Please generate an image where the clouds naturally form the following text: “PEACEFUL DAY.” The arrangement of the text should blend seamlessly with the shape and distribution of the clouds, ensuring that the overall realism of the clouds is maintained. All other elements in the image, including the ground, sheep, trees, and river, should remain unchanged, and the color and lighting of the clouds should be consistent. \ No newline at end of file diff --git a/dataset/text_editing_text_insertion_0002/meta.json b/dataset/text_editing_text_insertion_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..d5336e335576433c787963e22ecdc8d39a8277b6 --- /dev/null +++ b/dataset/text_editing_text_insertion_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "text insertion", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0084", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/text_editing_text_removal_0001/eval.json b/dataset/text_editing_text_removal_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c3251f5ed64689d5f73424401cbbe63a0f1c837e --- /dev/null +++ b/dataset/text_editing_text_removal_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Has the text removal process effectively eliminated the specified text from the image?", + "0_point_standard": "The specified text is still partially or fully visible after the removal process.", + "1_point_standard": "The specified text has been completely removed from the image, with no remaining traces or visible marks." + }, + { + "question": "Does the area where the text was removed blend seamlessly with the rest of the image, without noticeable artifacts or distortions?", + "0_point_standard": "The area where the text was removed shows noticeable artifacts, color mismatches, or distortions, disrupting the uniformity of the image.", + "1_point_standard": "The area looks natural and blends seamlessly with the surrounding parts of the image, without any visible artifacts or distortions." + }, + { + "question": "Has the text removal process preserved the original content, style, and features of the rest of the image?", + "0_point_standard": "Apart from the text removal, there are noticeable changes or alterations in the original content, style, or features of the image.", + "1_point_standard": "The rest of the image retains its original content, style, and features without any unexpected changes." + }, + { + "question": "Does the text removal meet the specific requirements outlined in the text description, such as retaining certain elements or formats?", + "0_point_standard": "The text removal does not meet the specific requirements or conditions outlined in the text description.", + "1_point_standard": "The text removal meets all the specified requirements and conditions described in the text input." + }, + { + "question": "Has the quality of the image been maintained, with no loss of resolution or visual quality due to the text removal process?", + "0_point_standard": "There is a degradation in resolution or visual quality of the image due to text removal, such as blurring or pixelation.", + "1_point_standard": "The image maintains its original resolution and visual quality, with no quality degradation due to text removal." + }, + { + "question": "Does the edited image exhibit a high level of professional aesthetic quality, appearing natural and visually pleasing?", + "0_point_standard": "The edited image lacks aesthetic quality, appearing unprofessional or unnatural.", + "1_point_standard": "The edited image exhibits a high level of professional aesthetic quality, appearing natural and visually pleasing." + } + ] +} \ No newline at end of file diff --git a/dataset/text_editing_text_removal_0001/images.txt b/dataset/text_editing_text_removal_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..3feb318e00226d8e0ea2381c3ec1fc8e42658c69 --- /dev/null +++ b/dataset/text_editing_text_removal_0001/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i4/O1CN01RmmpR41gbKehxhCGp_!!6000000004160-0-tps-3691-5536.jpg diff --git a/dataset/text_editing_text_removal_0001/instruction.txt b/dataset/text_editing_text_removal_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..e20b6ba30d5bdf3dc2412df683084c81ff87a4c1 --- /dev/null +++ b/dataset/text_editing_text_removal_0001/instruction.txt @@ -0,0 +1 @@ +Please generate an image by removing the larger text “São João” and “Ipiranga” from the picture, while keeping the smaller text, signs, buildings, and all other visual elements unchanged. The generated image should naturally remove these words, ensuring that the overall design of the sign remains intact. \ No newline at end of file diff --git a/dataset/text_editing_text_removal_0001/meta.json b/dataset/text_editing_text_removal_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..107a36d682e49f6284c2db29fe516023d8f88f9d --- /dev/null +++ b/dataset/text_editing_text_removal_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "text removal", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0083", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/text_editing_text_style_editing_0001/eval.json b/dataset/text_editing_text_style_editing_0001/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..4496f1f50462ddc427341cba838051213101fbef --- /dev/null +++ b/dataset/text_editing_text_style_editing_0001/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the modified text style in image A match the style in image B, accurately reflecting the features of the reference image?", + "0_point_standard": "The text style does not match the reference style of image B, showing significant differences or missing style elements.", + "1_point_standard": "The text style in image A closely matches the reference style of image B, accurately capturing the expected style features." + }, + { + "question": "Were only the specified text styles modified, with all other elements in image A remaining unchanged?", + "0_point_standard": "Unexpected changes are present in other parts of image A, affecting areas that should have remained unchanged.", + "1_point_standard": "Only the specified text styles were modified, with all other elements in image A completely unchanged." + }, + { + "question": "Is the modified text in image A logically coherent in its original context, ensuring the new style blends naturally with the surrounding content?", + "0_point_standard": "The modified text style appears out of place or inconsistent in the context of image A, disrupting visual harmony.", + "1_point_standard": "The modified text style blends naturally with the context of image A, enhancing coherence and integrating seamlessly with the surrounding content." + }, + { + "question": "Does the modified text fully retain the original content, only changing the style as per the reference?", + "0_point_standard": "The text content has been altered or is unclear after the style change, deviating from the original wording.", + "1_point_standard": "The text content is exactly the same as the original, with only the style modified according to the reference." + }, + { + "question": "Do the lighting, shadow, and texture of the modified text match the style of image B, contributing to a realistic and cohesive appearance?", + "0_point_standard": "The modified text shows inconsistencies in lighting, shadow, or texture, making it appear unrealistic or disconnected from the reference style.", + "1_point_standard": "The lighting, shadow, and texture of the modified text match the style of image B, creating a cohesive and realistic appearance." + }, + { + "question": "Does the final image maintain a high aesthetic quality, with the modified text style enhancing the overall visual appeal?", + "0_point_standard": "The final image lacks aesthetic cohesion or appeal, with the modified text style reducing its visual quality.", + "1_point_standard": "The final image exhibits high aesthetic quality, with the modified text style seamlessly enhancing its overall visual appeal." + } + ] +} \ No newline at end of file diff --git a/dataset/text_editing_text_style_editing_0001/images.txt b/dataset/text_editing_text_style_editing_0001/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..c0c8ea3dd4bd485ae58353265046fd89e8078b3b --- /dev/null +++ b/dataset/text_editing_text_style_editing_0001/images.txt @@ -0,0 +1,2 @@ +https://img.alicdn.com/imgextra/i3/O1CN015aqpvP1PModTyc9m1_!!6000000001827-0-tps-564-797.jpg +https://img.alicdn.com/imgextra/i2/O1CN01oDwTbm1YjYqU7RQow_!!6000000003095-0-tps-564-875.jpg diff --git a/dataset/text_editing_text_style_editing_0001/instruction.txt b/dataset/text_editing_text_style_editing_0001/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..0d86605a951811287d44fa6fe070b4ac29271217 --- /dev/null +++ b/dataset/text_editing_text_style_editing_0001/instruction.txt @@ -0,0 +1 @@ +Please apply the text style from the second image to the text content of the first image, while keeping all other elements of the first image unchanged as much as possible. If needed, slight adjustments to the layout and size of the text are allowed, but the text content “JUST DO IT” must remain unchanged. The transformed text should adopt the bubble-like, 3D style similar to the second image, giving it a light, transparent, and glossy appearance with a three-dimensional effect. \ No newline at end of file diff --git a/dataset/text_editing_text_style_editing_0001/meta.json b/dataset/text_editing_text_style_editing_0001/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..95b1782c97f25edd584852df82b3884a7a773415 --- /dev/null +++ b/dataset/text_editing_text_style_editing_0001/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "text style modification", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": true, + "multi_image_output": false, + "uid": "0082", + "output_image_count": 1, + "case_id": "0001" +} \ No newline at end of file diff --git a/dataset/three-view_generation_0002/eval.json b/dataset/three-view_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..b94875f461962ee70301a4669a6e3e5d73cbf249 --- /dev/null +++ b/dataset/three-view_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the output include all three required views (front, side, and top) to provide a complete representation of the object?", + "0_point_standard": "One or more views are missing, leading to an incomplete representation.", + "1_point_standard": "All three views are present, providing a complete representation of the object from the front, side, and top." + }, + { + "question": "Is the object's structure consistent across all three views, accurately reflecting the same shape and features in each view?", + "0_point_standard": "The structure or features of the object differ between views, causing inconsistency and suggesting different interpretations of the object.", + "1_point_standard": "The structure of the object is consistent across all views, clearly representing the same shape and features." + }, + { + "question": "Are the proportions of the object accurately represented in each view, maintaining correct dimensions and scale?", + "0_point_standard": "Proportions are distorted or inaccurate in one or more views, distorting the object's dimensions.", + "1_point_standard": "Proportions are accurately represented in each view, reflecting the correct dimensions and scale of the object." + }, + { + "question": "Are the key features of the object (such as edges, corners, and design details) correctly aligned across the three views?", + "0_point_standard": "Key features are misaligned or inconsistent between views, disrupting the visual coherence of the object's design.", + "1_point_standard": "Key features are correctly aligned across all views, maintaining a coherent and accurate representation." + }, + { + "question": "Is the level of detail in each view high enough to ensure that important features are clear and easily distinguishable?", + "0_point_standard": "In one or more views, details are unclear or poorly defined, making important features difficult to distinguish.", + "1_point_standard": "Each view is detailed and clear, with important features easily distinguishable and well-represented." + }, + { + "question": "Does the final set of three views exhibit a high level of aesthetics and professional quality, providing a refined and cohesive presentation?", + "0_point_standard": "The set of views lacks aesthetic cohesiveness or professional quality, with visual inconsistencies diminishing its presentation.", + "1_point_standard": "The set of three views is aesthetically pleasing and professionally rendered, with a cohesive and refined presentation." + } + ] +} \ No newline at end of file diff --git a/dataset/three-view_generation_0002/images.txt b/dataset/three-view_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..d2ab7b22aa9756213a3e9ad303266bbed812a84c --- /dev/null +++ b/dataset/three-view_generation_0002/images.txt @@ -0,0 +1 @@ +https://img.alicdn.com/imgextra/i1/O1CN01QesYlC1J3gObqNWId_!!6000000000973-0-tps-800-800.jpg diff --git a/dataset/three-view_generation_0002/instruction.txt b/dataset/three-view_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..afad5a072f29628d77f6528c7b2334e0857ba76e --- /dev/null +++ b/dataset/three-view_generation_0002/instruction.txt @@ -0,0 +1 @@ +Please generate a three-view illustration of the object, including the front, side and back, based on the image provided. Ensure that each view keeps the key features of the object consistent, such as shape, colour, texture and scale. The front view should clearly show the main facial or frontal features of the object, the side view shows the side profile and details of the object, and the back view should show the rear structure and appearance of the object. The generated three views should accurately represent all angles of the object and be consistent with the description. \ No newline at end of file diff --git a/dataset/three-view_generation_0002/meta.json b/dataset/three-view_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..d7e589618d4395ed049f21d679e4c7d014fb5b19 --- /dev/null +++ b/dataset/three-view_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "three-view generation", + "num_of_cases": 2, + "image_reference": true, + "multi_image_reference": false, + "multi_image_output": true, + "uid": "0043", + "output_image_count": 3, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/ticket_generation_0002/eval.json b/dataset/ticket_generation_0002/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..c94d543b113b444636393b6b13cc69ed5f89e9a9 --- /dev/null +++ b/dataset/ticket_generation_0002/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the ticket design match the text description and include all key information (such as date, location, event name)?", + "0_point_standard": "The ticket design does not match the description, and key information is missing or displayed incorrectly.", + "1_point_standard": "The ticket design matches the description and accurately displays all key information." + }, + { + "question": "Is the text on the ticket clear and easy to read, and do the font style and layout meet the design requirements?", + "0_point_standard": "The text is unclear, and the font style or layout does not meet the requirements, affecting overall readability.", + "1_point_standard": "The text is clear and easy to read, and the font style and layout meet the design requirements." + }, + { + "question": "Does the overall color scheme and visual style of the ticket align with the style requirements in the text description (e.g., modern, vintage)?", + "0_point_standard": "The color scheme and visual style do not match the text description and fail to convey the expected style.", + "1_point_standard": "The color scheme and visual style match the text description and convey the expected style effect." + }, + { + "question": "Does the model accurately implement the special design requirements from the text (e.g., watermark, security marks)?", + "0_point_standard": "The special design requirements from the text are not accurately implemented, or the details are insufficient.", + "1_point_standard": "The special design requirements from the text are accurately implemented, with precise details." + }, + { + "question": "Is the ticket's layout clear and reasonable, is the information organized orderly, and is the visual hierarchy distinct?", + "0_point_standard": "The ticket layout is chaotic, the information organization is disorderly, and the visual effect is confusing.", + "1_point_standard": "The ticket layout is clear and reasonable, the information organization is orderly, and the visual hierarchy is distinct." + }, + { + "question": "Does the overall aesthetic and design appeal of the ticket meet professional standards and possess strong attractiveness?", + "0_point_standard": "The ticket lacks overall aesthetic appeal, the design quality is poor, and it has insufficient attractiveness.", + "1_point_standard": "The ticket has excellent aesthetic appeal, high design quality, and strong visual attractiveness." + } + ] +} \ No newline at end of file diff --git a/dataset/ticket_generation_0002/images.txt b/dataset/ticket_generation_0002/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/ticket_generation_0002/instruction.txt b/dataset/ticket_generation_0002/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..444f6d4a22f4b1e1725417a1413857096bda90d3 --- /dev/null +++ b/dataset/ticket_generation_0002/instruction.txt @@ -0,0 +1 @@ +This image shows two futuristic cinema tickets stacked vertically, each with a space exploration theme and vibrant gradient backgrounds in shades of blue, purple, and red. Both tickets have a similar layout, with the left side dedicated to admission details and the right side showing event information, separated by a wavy, layered design resembling paper cutouts in varying shades of blue and purple. On the left side of each ticket, the text “ADMIT ONE” is written vertically in uppercase letters, with “PRICE: 25.0$” in smaller text directly below it. At the bottom, the ticket ID “TICKET ID:1234567” is displayed in small font on both tickets. The right side of the upper ticket features an astronaut illustration in a white spacesuit standing against a backdrop of stars, planets, and a large, purple-tinted moon. The text “GALAXY CINEMA” appears at the top in uppercase white letters. Below it, the date and time details are given as “11 | 23rd” and “TIME: 5P.M.” in bold font. Further down, the location is noted as “Galaxycinema, NY, 1354.” The lower ticket has a similar design, with a red and white space shuttle replacing the astronaut, set against the same cosmic background of stars and planets. The same information appears on this ticket: “GALAXY CINEMA” at the top, followed by “11 | 23rd,” “TIME: 5P.M.,” and “Galaxycinema, NY, 1354.” The overall design of both tickets combines a modern space theme with vibrant gradients and minimalist typography, creating an exciting, immersive look suitable for a space-themed cinema event. \ No newline at end of file diff --git a/dataset/ticket_generation_0002/meta.json b/dataset/ticket_generation_0002/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..3430ff28840263d61993e97933ec2d638cadb7ee --- /dev/null +++ b/dataset/ticket_generation_0002/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ticket generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0031", + "output_image_count": 1, + "case_id": "0002" +} \ No newline at end of file diff --git a/dataset/ticket_generation_0003/eval.json b/dataset/ticket_generation_0003/eval.json new file mode 100644 index 0000000000000000000000000000000000000000..e000ca021be3b08e347a249b0d77c0e5ace98dfd --- /dev/null +++ b/dataset/ticket_generation_0003/eval.json @@ -0,0 +1,34 @@ +{ + "questions": [ + { + "question": "Does the ticket design match the text description and contain all key information (e.g., date, location, event name)?", + "0_point_standard": "The ticket design does not match the description, key information is missing or incorrectly displayed.", + "1_point_standard": "The ticket design matches the description and accurately displays all key information." + }, + { + "question": "Is the text on the ticket clear and easy to read, and do the font style and layout meet the design requirements?", + "0_point_standard": "The text is unclear, and the font style or layout does not meet the requirements, affecting overall readability.", + "1_point_standard": "The text is clear and easy to read, and the font style and layout meet design requirements." + }, + { + "question": "Does the overall color scheme and visual style of the ticket match the style requirements described in the text (e.g., modern, vintage)?", + "0_point_standard": "The color scheme and visual style do not match the text description and fail to convey the expected style.", + "1_point_standard": "The color scheme and visual style match the text description and convey the expected style effect." + }, + { + "question": "Does the design accurately implement the special design requirements in the text (e.g., watermark, security mark)?", + "0_point_standard": "The special design requirements in the text are not accurately implemented or lack detail.", + "1_point_standard": "The special design requirements in the text are accurately implemented with precise details." + }, + { + "question": "Is the ticket layout clear and reasonable, is the information organized systematically, and is the visual hierarchy distinct?", + "0_point_standard": "The ticket layout is chaotic, the information organization is disorganized, and the visual effect is confusing.", + "1_point_standard": "The ticket layout is clear and reasonable, the information is organized systematically, and the visual hierarchy is distinct." + }, + { + "question": "Does the overall aesthetic appeal and design attractiveness of the ticket meet professional standards and have strong appeal?", + "0_point_standard": "The ticket lacks overall aesthetic appeal, has poor design quality, and insufficient attractiveness.", + "1_point_standard": "The ticket has excellent aesthetic appeal, high design quality, and strong visual attractiveness." + } + ] +} \ No newline at end of file diff --git a/dataset/ticket_generation_0003/images.txt b/dataset/ticket_generation_0003/images.txt new file mode 100644 index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 diff --git a/dataset/ticket_generation_0003/instruction.txt b/dataset/ticket_generation_0003/instruction.txt new file mode 100644 index 0000000000000000000000000000000000000000..07fccd6af93c540595848745bb633c4e19c83e76 --- /dev/null +++ b/dataset/ticket_generation_0003/instruction.txt @@ -0,0 +1 @@ +This image features two vertically oriented tickets with a design inspired by Vincent van Gogh’s artwork, each showcasing a famous painting by the artist. The left ticket has a yellow background, displaying “Sunflowers,” while the right ticket has a white background, displaying “The Starry Night.” Both tickets have rounded corners and feature a similar layout with two circular cut-out holes, one near the top right and the other near the bottom center. At the top of each ticket, the date “08 | 11 | 2023” is displayed in black, with “08” and “11” separated by a line and “2023” aligned to the right. Below this, the name “VINCENT” is prominently displayed in a large serif font, with “WILLEM VAN GOGH” written in smaller uppercase letters beneath it. A dotted line separates the artwork section from the ticket information at the bottom. The bottom section contains a small black dot icon followed by the word “TICKET” in uppercase letters. Below this, there is a short description in smaller font: “Dutch Post-Impressionist painter who posthumously became one of the most famous and influential figures in Western art history. In a decade, he created about 2,100 artworks.” At the very bottom of each ticket, a barcode is displayed. The left ticket uses a yellow background to match the color of “Sunflowers,” while the right ticket has a white background complementing the darker tones of “The Starry Night.” The overall design of these tickets is elegant and art-themed, with a focus on Vincent van Gogh’s legacy and iconic works. \ No newline at end of file diff --git a/dataset/ticket_generation_0003/meta.json b/dataset/ticket_generation_0003/meta.json new file mode 100644 index 0000000000000000000000000000000000000000..47ddc9b93780e47e51efc0fec395dbb773ba6559 --- /dev/null +++ b/dataset/ticket_generation_0003/meta.json @@ -0,0 +1,10 @@ +{ + "task_name": "ticket generation", + "num_of_cases": 3, + "image_reference": false, + "multi_image_reference": false, + "multi_image_output": false, + "uid": "0031", + "output_image_count": 1, + "case_id": "0003" +} \ No newline at end of file