Omost

Running on Zero

App Files Files

layerdiffusion commited on May 30

Commit

9ab270d

•

1 Parent(s): fdeb859

i

Browse files

Files changed (5) hide show

LICENSE +201 -0
app.py +357 -8
chat_interface.py +628 -0
lib_omost/canvas.py +248 -0
lib_omost/pipeline.py +435 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

app.py CHANGED Viewed

@@ -1,14 +1,363 @@
-import gradio as gr
 import spaces
 import torch
-zero = torch.Tensor([0]).cuda()
-print(zero.device) # <-- 'cpu' 🤔
 @spaces.GPU
-def greet(n):
-    print(zero.device) # <-- 'cuda:0' 🤗
-    return f"Hello {zero + n} Tensor"
-demo = gr.Interface(fn=greet, inputs=gr.Number(), outputs=gr.Text())
-demo.launch()

+# import gradio as gr
+#
+# import torch
+#
+# zero = torch.Tensor([0]).cuda()
+# print(zero.device) # <-- 'cpu' 🤔
+#
+# @spaces.GPU
+# def greet(n):
+#     print(zero.device) # <-- 'cuda:0' 🤗
+#     return f"Hello {zero + n} Tensor"
+#
+# demo = gr.Interface(fn=greet, inputs=gr.Number(), outputs=gr.Text())
+# demo.launch()
+import os
 import spaces
+os.environ['HF_HOME'] = os.path.join(os.path.dirname(__file__), 'hf_download')
+HF_TOKEN = os.environ['hf_token'] if 'hf_token' in os.environ else None
+import uuid
 import torch
+import numpy as np
+import gradio as gr
+import tempfile
+gradio_temp_dir = os.path.join(tempfile.gettempdir(), 'gradio')
+os.makedirs(gradio_temp_dir, exist_ok=True)
+from threading import Thread
+# Phi3 Hijack
+from transformers.models.phi3.modeling_phi3 import Phi3PreTrainedModel
+Phi3PreTrainedModel._supports_sdpa = True
+from PIL import Image
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
+from diffusers import AutoencoderKL, UNet2DConditionModel
+from diffusers.models.attention_processor import AttnProcessor2_0
+from transformers import CLIPTextModel, CLIPTokenizer
+from lib_omost.pipeline import StableDiffusionXLOmostPipeline
+from chat_interface import ChatInterface
+import lib_omost.canvas as omost_canvas
+# SDXL
+sdxl_name = 'SG161222/RealVisXL_V4.0'
+# sdxl_name = 'stabilityai/stable-diffusion-xl-base-1.0'
+tokenizer = CLIPTokenizer.from_pretrained(
+    sdxl_name, subfolder="tokenizer")
+tokenizer_2 = CLIPTokenizer.from_pretrained(
+    sdxl_name, subfolder="tokenizer_2")
+text_encoder = CLIPTextModel.from_pretrained(
+    sdxl_name, subfolder="text_encoder", torch_dtype=torch.float16, variant="fp16", device_map="auto")
+text_encoder_2 = CLIPTextModel.from_pretrained(
+    sdxl_name, subfolder="text_encoder_2", torch_dtype=torch.float16, variant="fp16", device_map="auto")
+vae = AutoencoderKL.from_pretrained(
+    sdxl_name, subfolder="vae", torch_dtype=torch.bfloat16, variant="fp16", device_map="auto")  # bfloat16 vae
+unet = UNet2DConditionModel.from_pretrained(
+    sdxl_name, subfolder="unet", torch_dtype=torch.float16, variant="fp16", device_map="auto")
+unet.set_attn_processor(AttnProcessor2_0())
+vae.set_attn_processor(AttnProcessor2_0())
+pipeline = StableDiffusionXLOmostPipeline(
+    vae=vae,
+    text_encoder=text_encoder,
+    tokenizer=tokenizer,
+    text_encoder_2=text_encoder_2,
+    tokenizer_2=tokenizer_2,
+    unet=unet,
+    scheduler=None,  # We completely give up diffusers sampling system and use A1111's method
+)
+# LLM
+# model_name = 'lllyasviel/omost-phi-3-mini-128k-8bits'
+llm_name = 'lllyasviel/omost-llama-3-8b-4bits'
+# model_name = 'lllyasviel/omost-dolphin-2.9-llama3-8b-4bits'
+llm_model = AutoModelForCausalLM.from_pretrained(
+    llm_name,
+    torch_dtype=torch.bfloat16,  # This is computation type, not load/memory type. The loading quant type is baked in config.
+    token=HF_TOKEN,
+    device_map="auto"
+)
+llm_tokenizer = AutoTokenizer.from_pretrained(
+    llm_name,
+    token=HF_TOKEN
+)
+@torch.inference_mode()
+def pytorch2numpy(imgs):
+    results = []
+    for x in imgs:
+        y = x.movedim(0, -1)
+        y = y * 127.5 + 127.5
+        y = y.detach().float().cpu().numpy().clip(0, 255).astype(np.uint8)
+        results.append(y)
+    return results
+@torch.inference_mode()
+def numpy2pytorch(imgs):
+    h = torch.from_numpy(np.stack(imgs, axis=0)).float() / 127.5 - 1.0
+    h = h.movedim(-1, 1)
+    return h
+def resize_without_crop(image, target_width, target_height):
+    pil_image = Image.fromarray(image)
+    resized_image = pil_image.resize((target_width, target_height), Image.LANCZOS)
+    return np.array(resized_image)
 @spaces.GPU
+@torch.inference_mode()
+def chat_fn(message: str, history: list, seed:int, temperature: float, top_p: float, max_new_tokens: int) -> str:
+    np.random.seed(int(seed))
+    torch.manual_seed(int(seed))
+    conversation = [{"role": "system", "content": omost_canvas.system_prompt}]
+    for user, assistant in history:
+        if user is None or assistant is None:
+            continue
+        conversation.extend([{"role": "user", "content": user}, {"role": "assistant", "content": assistant}])
+    conversation.append({"role": "user", "content": message})
+    input_ids = llm_tokenizer.apply_chat_template(
+        conversation, return_tensors="pt", add_generation_prompt=True).to(llm_model.device)
+    streamer = TextIteratorStreamer(llm_tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True)
+    generate_kwargs = dict(
+        input_ids=input_ids,
+        streamer=streamer,
+        max_new_tokens=max_new_tokens,
+        do_sample=True,
+        temperature=temperature,
+        top_p=top_p,
+    )
+    if temperature == 0:
+        generate_kwargs['do_sample'] = False
+    Thread(target=llm_model.generate, kwargs=generate_kwargs).start()
+    outputs = []
+    for text in streamer:
+        outputs.append(text)
+        # print(outputs)
+        yield "".join(outputs)
+    return
+@torch.inference_mode()
+def post_chat(history):
+    history = [(user, assistant) for user, assistant in history if isinstance(user, str) and isinstance(assistant, str)]
+    last_assistant = history[-1][1]
+    canvas_outputs = None
+    try:
+        canvas = omost_canvas.Canvas.from_bot_response(last_assistant)
+        canvas_outputs = canvas.process()
+    except Exception as e:
+        print('Last assistant response is not valid canvas:', e)
+    return canvas_outputs, gr.update(visible=canvas_outputs is not None)
+@spaces.GPU
+@torch.inference_mode()
+def diffusion_fn(chatbot, canvas_outputs, num_samples, seed, image_width, image_height,
+                 highres_scale, steps, cfg, highres_steps, highres_denoise, negative_prompt):
+    use_initial_latent = False
+    eps = 0.05
+    image_width, image_height = int(image_width // 64) * 64, int(image_height // 64) * 64
+    rng = torch.Generator(unet.device).manual_seed(seed)
+    positive_cond, positive_pooler, negative_cond, negative_pooler = pipeline.all_conds_from_canvas(canvas_outputs, negative_prompt)
+    if use_initial_latent:
+        initial_latent = torch.from_numpy(canvas_outputs['initial_latent'])[None].movedim(-1, 1) / 127.5 - 1.0
+        initial_latent_blur = 40
+        initial_latent = torch.nn.functional.avg_pool2d(
+            torch.nn.functional.pad(initial_latent, (initial_latent_blur,) * 4, mode='reflect'),
+            kernel_size=(initial_latent_blur * 2 + 1,) * 2, stride=(1, 1))
+        initial_latent = torch.nn.functional.interpolate(initial_latent, (image_height, image_width))
+        initial_latent = initial_latent.to(dtype=vae.dtype, device=vae.device)
+        initial_latent = vae.encode(initial_latent).latent_dist.mode() * vae.config.scaling_factor
+    else:
+        initial_latent = torch.zeros(size=(num_samples, 4, image_height // 8, image_width // 8), dtype=torch.float32)
+    initial_latent = initial_latent.to(dtype=unet.dtype, device=unet.device)
+    latents = pipeline(
+        initial_latent=initial_latent,
+        strength=1.0,
+        num_inference_steps=int(steps),
+        batch_size=num_samples,
+        prompt_embeds=positive_cond,
+        negative_prompt_embeds=negative_cond,
+        pooled_prompt_embeds=positive_pooler,
+        negative_pooled_prompt_embeds=negative_pooler,
+        generator=rng,
+        guidance_scale=float(cfg),
+    ).images
+    latents = latents.to(dtype=vae.dtype, device=vae.device) / vae.config.scaling_factor
+    pixels = vae.decode(latents).sample
+    B, C, H, W = pixels.shape
+    pixels = pytorch2numpy(pixels)
+    if highres_scale > 1.0 + eps:
+        pixels = [
+            resize_without_crop(
+                image=p,
+                target_width=int(round(W * highres_scale / 64.0) * 64),
+                target_height=int(round(H * highres_scale / 64.0) * 64)
+            ) for p in pixels
+        ]
+        pixels = numpy2pytorch(pixels).to(device=vae.device, dtype=vae.dtype)
+        latents = vae.encode(pixels).latent_dist.mode() * vae.config.scaling_factor
+        latents = latents.to(device=unet.device, dtype=unet.dtype)
+        latents = pipeline(
+            initial_latent=latents,
+            strength=highres_denoise,
+            num_inference_steps=highres_steps,
+            batch_size=num_samples,
+            prompt_embeds=positive_cond,
+            negative_prompt_embeds=negative_cond,
+            pooled_prompt_embeds=positive_pooler,
+            negative_pooled_prompt_embeds=negative_pooler,
+            generator=rng,
+            guidance_scale=float(cfg),
+        ).images
+        latents = latents.to(dtype=vae.dtype, device=vae.device) / vae.config.scaling_factor
+        pixels = vae.decode(latents).sample
+        pixels = pytorch2numpy(pixels)
+    for i in range(len(pixels)):
+        unique_hex = uuid.uuid4().hex
+        image_path = os.path.join(gradio_temp_dir, f"{unique_hex}_{i}.png")
+        image = Image.fromarray(pixels[i])
+        image.save(image_path)
+        chatbot = chatbot + [(None, (image_path, 'image'))]
+    return chatbot
+css = '''
+code {white-space: pre-wrap !important;}
+.gradio-container {max-width: none !important;}
+.outer_parent {flex: 1;}
+.inner_parent {flex: 1;}
+footer {display: none !important; visibility: hidden !important;}
+.translucent {display: none !important; visibility: hidden !important;}
+'''
+with gr.Blocks(fill_height=True, css=css) as demo:
+    with gr.Row(elem_classes='outer_parent'):
+        with gr.Column(scale=25):
+            with gr.Row():
+                retry_btn = gr.Button("🔄 Retry", variant="secondary", size="sm", min_width=60)
+                undo_btn = gr.Button("↩️ Undo", variant="secondary", size="sm", min_width=60)
+                clear_btn = gr.Button("⭐️ New Chat", variant="secondary", size="sm", min_width=60)
+            seed = gr.Number(label="Random Seed", value=12345, precision=0)
+            with gr.Accordion(open=True, label='Language Model'):
+                with gr.Group():
+                    with gr.Row():
+                        temperature = gr.Slider(
+                            minimum=0.0,
+                            maximum=2.0,
+                            step=0.01,
+                            value=0.6,
+                            label="Temperature")
+                        top_p = gr.Slider(
+                            minimum=0.0,
+                            maximum=1.0,
+                            step=0.01,
+                            value=0.9,
+                            label="Top P")
+                    max_new_tokens = gr.Slider(
+                        minimum=128,
+                        maximum=4096,
+                        step=1,
+                        value=4096,
+                        label="Max New Tokens")
+            with gr.Accordion(open=True, label='Image Diffusion Model'):
+                with gr.Group():
+                    with gr.Row():
+                        image_width = gr.Slider(label="Image Width", minimum=256, maximum=2048, value=896, step=64)
+                        image_height = gr.Slider(label="Image Height", minimum=256, maximum=2048, value=1152, step=64)
+                    with gr.Row():
+                        num_samples = gr.Slider(label="Image Number", minimum=1, maximum=12, value=1, step=1)
+                        steps = gr.Slider(label="Sampling Steps", minimum=1, maximum=100, value=25, step=1)
+            with gr.Accordion(open=False, label='Advanced'):
+                cfg = gr.Slider(label="CFG Scale", minimum=1.0, maximum=32.0, value=5.0, step=0.01)
+                highres_scale = gr.Slider(label="HR-fix Scale (\"1\" is disabled)", minimum=1.0, maximum=2.0, value=1.0, step=0.01)
+                highres_steps = gr.Slider(label="Highres Fix Steps", minimum=1, maximum=100, value=20, step=1)
+                highres_denoise = gr.Slider(label="Highres Fix Denoise", minimum=0.1, maximum=1.0, value=0.4, step=0.01)
+                n_prompt = gr.Textbox(label="Negative Prompt", value='lowres, bad anatomy, bad hands, cropped, worst quality')
+            render_button = gr.Button("Render the Image!", size='lg', variant="primary", visible=False)
+            examples = gr.Dataset(
+                samples=[
+                    ['generate an image of the fierce battle of warriors and a dragon'],
+                    ['change the dragon to a dinosaur']
+                ],
+                components=[gr.Textbox(visible=False)],
+                label='Quick Prompts'
+            )
+        with gr.Column(scale=75, elem_classes='inner_parent'):
+            canvas_state = gr.State(None)
+            chatbot = gr.Chatbot(label='Omost', scale=1, bubble_full_width=True, render=False)
+            chatInterface = ChatInterface(
+                fn=chat_fn,
+                post_fn=post_chat,
+                post_fn_kwargs=dict(inputs=[chatbot], outputs=[canvas_state, render_button]),
+                pre_fn=lambda: gr.update(visible=False),
+                pre_fn_kwargs=dict(outputs=[render_button]),
+                chatbot=chatbot,
+                retry_btn=retry_btn,
+                undo_btn=undo_btn,
+                clear_btn=clear_btn,
+                additional_inputs=[seed, temperature, top_p, max_new_tokens],
+                examples=examples
+            )
+    render_button.click(
+        fn=diffusion_fn, inputs=[
+            chatInterface.chatbot, canvas_state,
+            num_samples, seed, image_width, image_height, highres_scale,
+            steps, cfg, highres_steps, highres_denoise, n_prompt
+        ], outputs=[chatInterface.chatbot]).then(
+        fn=lambda x: x, inputs=[
+            chatInterface.chatbot
+        ], outputs=[chatInterface.chatbot_state])
+if __name__ == "__main__":
+    demo.queue().launch(inbrowser=True, server_name='0.0.0.0')

chat_interface.py ADDED Viewed

	@@ -0,0 +1,628 @@

+"""
+This file defines a useful high-level abstraction to build Gradio chatbots: ChatInterface.
+"""
+from __future__ import annotations
+import inspect
+from typing import AsyncGenerator, Callable, Literal, Union, cast
+import anyio
+from gradio_client.documentation import document
+from gradio.blocks import Blocks
+from gradio.components import (
+    Button,
+    Chatbot,
+    Component,
+    Markdown,
+    MultimodalTextbox,
+    State,
+    Textbox,
+    get_component_instance,
+    Dataset
+)
+from gradio.events import Dependency, on
+from gradio.helpers import special_args
+from gradio.layouts import Accordion, Group, Row
+from gradio.routes import Request
+from gradio.themes import ThemeClass as Theme
+from gradio.utils import SyncToAsyncIterator, async_iteration, async_lambda
+@document()
+class ChatInterface(Blocks):
+    """
+    ChatInterface is Gradio's high-level abstraction for creating chatbot UIs, and allows you to create
+    a web-based demo around a chatbot model in a few lines of code. Only one parameter is required: fn, which
+    takes a function that governs the response of the chatbot based on the user input and chat history. Additional
+    parameters can be used to control the appearance and behavior of the demo.
+    Example:
+        import gradio as gr
+        def echo(message, history):
+            return message
+        demo = gr.ChatInterface(fn=echo, examples=["hello", "hola", "merhaba"], title="Echo Bot")
+        demo.launch()
+    Demos: chatinterface_multimodal, chatinterface_random_response, chatinterface_streaming_echo
+    Guides: creating-a-chatbot-fast, sharing-your-app
+    """
+    def __init__(
+        self,
+        fn: Callable,
+        post_fn: Callable,
+        pre_fn: Callable,
+        chatbot: Chatbot,
+        *,
+        post_fn_kwargs: dict = None,
+        pre_fn_kwargs: dict = None,
+        multimodal: bool = False,
+        textbox: Textbox | MultimodalTextbox | None = None,
+        additional_inputs: str | Component | list[str | Component] | None = None,
+        additional_inputs_accordion_name: str | None = None,
+        additional_inputs_accordion: str | Accordion | None = None,
+        examples: Dataset = None,
+        title: str | None = None,
+        description: str | None = None,
+        theme: Theme | str | None = None,
+        css: str | None = None,
+        js: str | None = None,
+        head: str | None = None,
+        analytics_enabled: bool | None = None,
+        submit_btn: str | None | Button = "Submit",
+        stop_btn: str | None | Button = "Stop",
+        retry_btn: str | None | Button = "🔄  Retry",
+        undo_btn: str | None | Button = "↩️ Undo",
+        clear_btn: str | None | Button = "🗑️  Clear",
+        autofocus: bool = True,
+        concurrency_limit: int | None | Literal["default"] = "default",
+        fill_height: bool = True,
+        delete_cache: tuple[int, int] | None = None,
+    ):
+        super().__init__(
+            analytics_enabled=analytics_enabled,
+            mode="chat_interface",
+            css=css,
+            title=title or "Gradio",
+            theme=theme,
+            js=js,
+            head=head,
+            fill_height=fill_height,
+            delete_cache=delete_cache,
+        )
+        if post_fn_kwargs is None:
+            post_fn_kwargs = []
+        self.post_fn = post_fn
+        self.post_fn_kwargs = post_fn_kwargs
+        self.pre_fn = pre_fn
+        self.pre_fn_kwargs = pre_fn_kwargs
+        self.multimodal = multimodal
+        self.concurrency_limit = concurrency_limit
+        self.fn = fn
+        self.is_async = inspect.iscoroutinefunction(
+            self.fn
+        ) or inspect.isasyncgenfunction(self.fn)
+        self.is_generator = inspect.isgeneratorfunction(
+            self.fn
+        ) or inspect.isasyncgenfunction(self.fn)
+        if additional_inputs:
+            if not isinstance(additional_inputs, list):
+                additional_inputs = [additional_inputs]
+            self.additional_inputs = [
+                get_component_instance(i)
+                for i in additional_inputs  # type: ignore
+            ]
+        else:
+            self.additional_inputs = []
+        if additional_inputs_accordion_name is not None:
+            print(
+                "The `additional_inputs_accordion_name` parameter is deprecated and will be removed in a future version of Gradio. Use the `additional_inputs_accordion` parameter instead."
+            )
+            self.additional_inputs_accordion_params = {
+                "label": additional_inputs_accordion_name
+            }
+        if additional_inputs_accordion is None:
+            self.additional_inputs_accordion_params = {
+                "label": "Additional Inputs",
+                "open": False,
+            }
+        elif isinstance(additional_inputs_accordion, str):
+            self.additional_inputs_accordion_params = {
+                "label": additional_inputs_accordion
+            }
+        elif isinstance(additional_inputs_accordion, Accordion):
+            self.additional_inputs_accordion_params = (
+                additional_inputs_accordion.recover_kwargs(
+                    additional_inputs_accordion.get_config()
+                )
+            )
+        else:
+            raise ValueError(
+                f"The `additional_inputs_accordion` parameter must be a string or gr.Accordion, not {type(additional_inputs_accordion)}"
+            )
+        with self:
+            if title:
+                Markdown(
+                    f"<h1 style='text-align: center; margin-bottom: 1rem'>{self.title}</h1>"
+                )
+            if description:
+                Markdown(description)
+            self.chatbot = chatbot.render()
+            self.buttons = [retry_btn, undo_btn, clear_btn]
+            with Group():
+                with Row():
+                    if textbox:
+                        if self.multimodal:
+                            submit_btn = None
+                        else:
+                            textbox.container = False
+                        textbox.show_label = False
+                        textbox_ = textbox.render()
+                        if not isinstance(textbox_, (Textbox, MultimodalTextbox)):
+                            raise TypeError(
+                                f"Expected a gr.Textbox or gr.MultimodalTextbox component, but got {type(textbox_)}"
+                            )
+                        self.textbox = textbox_
+                    elif self.multimodal:
+                        submit_btn = None
+                        self.textbox = MultimodalTextbox(
+                            show_label=False,
+                            label="Message",
+                            placeholder="Type a message...",
+                            scale=7,
+                            autofocus=autofocus,
+                        )
+                    else:
+                        self.textbox = Textbox(
+                            container=False,
+                            show_label=False,
+                            label="Message",
+                            placeholder="Type a message...",
+                            scale=7,
+                            autofocus=autofocus,
+                        )
+                    if submit_btn is not None and not multimodal:
+                        if isinstance(submit_btn, Button):
+                            submit_btn.render()
+                        elif isinstance(submit_btn, str):
+                            submit_btn = Button(
+                                submit_btn,
+                                variant="primary",
+                                scale=1,
+                                min_width=150,
+                            )
+                        else:
+                            raise ValueError(
+                                f"The submit_btn parameter must be a gr.Button, string, or None, not {type(submit_btn)}"
+                            )
+                    if stop_btn is not None:
+                        if isinstance(stop_btn, Button):
+                            stop_btn.visible = False
+                            stop_btn.render()
+                        elif isinstance(stop_btn, str):
+                            stop_btn = Button(
+                                stop_btn,
+                                variant="stop",
+                                visible=False,
+                                scale=1,
+                                min_width=150,
+                            )
+                        else:
+                            raise ValueError(
+                                f"The stop_btn parameter must be a gr.Button, string, or None, not {type(stop_btn)}"
+                            )
+                    self.buttons.extend([submit_btn, stop_btn])  # type: ignore
+                self.fake_api_btn = Button("Fake API", visible=False)
+                self.fake_response_textbox = Textbox(label="Response", visible=False)
+                (
+                    self.retry_btn,
+                    self.undo_btn,
+                    self.clear_btn,
+                    self.submit_btn,
+                    self.stop_btn,
+                ) = self.buttons
+            any_unrendered_inputs = any(
+                not inp.is_rendered for inp in self.additional_inputs
+            )
+            if self.additional_inputs and any_unrendered_inputs:
+                with Accordion(**self.additional_inputs_accordion_params):  # type: ignore
+                    for input_component in self.additional_inputs:
+                        if not input_component.is_rendered:
+                            input_component.render()
+            self.saved_input = State()
+            self.chatbot_state = (
+                State(self.chatbot.value) if self.chatbot.value else State([])
+            )
+            self._setup_events()
+            self._setup_api()
+        if examples:
+            examples.click(lambda x: x[0], inputs=[examples], outputs=self.textbox, show_progress=False, queue=False)
+    def _setup_events(self) -> None:
+        submit_fn = self._stream_fn if self.is_generator else self._submit_fn
+        submit_triggers = (
+            [self.textbox.submit, self.submit_btn.click]
+            if self.submit_btn
+            else [self.textbox.submit]
+        )
+        submit_event = (
+            on(
+                submit_triggers,
+                self._clear_and_save_textbox,
+                [self.textbox],
+                [self.textbox, self.saved_input],
+                show_api=False,
+                queue=False,
+            )
+            .then(
+                self.pre_fn,
+                **self.pre_fn_kwargs,
+                show_api=False,
+                queue=False,
+            )
+            .then(
+                self._display_input,
+                [self.saved_input, self.chatbot_state],
+                [self.chatbot, self.chatbot_state],
+                show_api=False,
+                queue=False,
+            )
+            .then(
+                submit_fn,
+                [self.saved_input, self.chatbot_state] + self.additional_inputs,
+                [self.chatbot, self.chatbot_state],
+                show_api=False,
+                concurrency_limit=cast(
+                    Union[int, Literal["default"], None], self.concurrency_limit
+                ),
+            ).then(
+                self.post_fn,
+                **self.post_fn_kwargs,
+                show_api=False,
+                concurrency_limit=cast(
+                    Union[int, Literal["default"], None], self.concurrency_limit
+                ),
+            )
+        )
+        self._setup_stop_events(submit_triggers, submit_event)
+        if self.retry_btn:
+            retry_event = (
+                self.retry_btn.click(
+                    self._delete_prev_fn,
+                    [self.saved_input, self.chatbot_state],
+                    [self.chatbot, self.saved_input, self.chatbot_state],
+                    show_api=False,
+                    queue=False,
+                )
+                .then(
+                    self.pre_fn,
+                    **self.pre_fn_kwargs,
+                    show_api=False,
+                    queue=False,
+                )
+                .then(
+                    self._display_input,
+                    [self.saved_input, self.chatbot_state],
+                    [self.chatbot, self.chatbot_state],
+                    show_api=False,
+                    queue=False,
+                )
+                .then(
+                    submit_fn,
+                    [self.saved_input, self.chatbot_state] + self.additional_inputs,
+                    [self.chatbot, self.chatbot_state],
+                    show_api=False,
+                    concurrency_limit=cast(
+                        Union[int, Literal["default"], None], self.concurrency_limit
+                    ),
+                ).then(
+                self.post_fn,
+                **self.post_fn_kwargs,
+                show_api=False,
+                concurrency_limit=cast(
+                    Union[int, Literal["default"], None], self.concurrency_limit
+                ),
+            )
+            )
+            self._setup_stop_events([self.retry_btn.click], retry_event)
+        if self.undo_btn:
+            self.undo_btn.click(
+                self._delete_prev_fn,
+                [self.saved_input, self.chatbot_state],
+                [self.chatbot, self.saved_input, self.chatbot_state],
+                show_api=False,
+                queue=False,
+            ).then(
+                self.pre_fn,
+                **self.pre_fn_kwargs,
+                show_api=False,
+                queue=False,
+            ).then(
+                async_lambda(lambda x: x),
+                [self.saved_input],
+                [self.textbox],
+                show_api=False,
+                queue=False,
+            ).then(
+                self.post_fn,
+                **self.post_fn_kwargs,
+                show_api=False,
+                concurrency_limit=cast(
+                    Union[int, Literal["default"], None], self.concurrency_limit
+                ),
+            )
+        if self.clear_btn:
+            self.clear_btn.click(
+                async_lambda(lambda: ([], [], None)),
+                None,
+                [self.chatbot, self.chatbot_state, self.saved_input],
+                queue=False,
+                show_api=False,
+            ).then(
+                self.pre_fn,
+                **self.pre_fn_kwargs,
+                show_api=False,
+                queue=False,
+            ).then(
+                self.post_fn,
+                **self.post_fn_kwargs,
+                show_api=False,
+                concurrency_limit=cast(
+                    Union[int, Literal["default"], None], self.concurrency_limit
+                ),
+            )
+    def _setup_stop_events(
+        self, event_triggers: list[Callable], event_to_cancel: Dependency
+    ) -> None:
+        if self.stop_btn and self.is_generator:
+            if self.submit_btn:
+                for event_trigger in event_triggers:
+                    event_trigger(
+                        async_lambda(
+                            lambda: (
+                                Button(visible=False),
+                                Button(visible=True),
+                            )
+                        ),
+                        None,
+                        [self.submit_btn, self.stop_btn],
+                        show_api=False,
+                        queue=False,
+                    )
+                event_to_cancel.then(
+                    async_lambda(lambda: (Button(visible=True), Button(visible=False))),
+                    None,
+                    [self.submit_btn, self.stop_btn],
+                    show_api=False,
+                    queue=False,
+                )
+            else:
+                for event_trigger in event_triggers:
+                    event_trigger(
+                        async_lambda(lambda: Button(visible=True)),
+                        None,
+                        [self.stop_btn],
+                        show_api=False,
+                        queue=False,
+                    )
+                event_to_cancel.then(
+                    async_lambda(lambda: Button(visible=False)),
+                    None,
+                    [self.stop_btn],
+                    show_api=False,
+                    queue=False,
+                )
+            self.stop_btn.click(
+                None,
+                None,
+                None,
+                cancels=event_to_cancel,
+                show_api=False,
+            )
+    def _setup_api(self) -> None:
+        api_fn = self._api_stream_fn if self.is_generator else self._api_submit_fn
+        self.fake_api_btn.click(
+            api_fn,
+            [self.textbox, self.chatbot_state] + self.additional_inputs,
+            [self.textbox, self.chatbot_state],
+            api_name="chat",
+            concurrency_limit=cast(
+                Union[int, Literal["default"], None], self.concurrency_limit
+            ),
+        )
+    def _clear_and_save_textbox(self, message: str) -> tuple[str | dict, str]:
+        if self.multimodal:
+            return {"text": "", "files": []}, message
+        else:
+            return "", message
+    def _append_multimodal_history(
+        self,
+        message: dict[str, list],
+        response: str | None,
+        history: list[list[str | tuple | None]],
+    ):
+        for x in message["files"]:
+            history.append([(x,), None])
+        if message["text"] is None or not isinstance(message["text"], str):
+            return
+        elif message["text"] == "" and message["files"] != []:
+            history.append([None, response])
+        else:
+            history.append([message["text"], response])
+    async def _display_input(
+        self, message: str | dict[str, list], history: list[list[str | tuple | None]]
+    ) -> tuple[list[list[str | tuple | None]], list[list[str | tuple | None]]]:
+        if self.multimodal and isinstance(message, dict):
+            self._append_multimodal_history(message, None, history)
+        elif isinstance(message, str):
+            history.append([message, None])
+        return history, history
+    async def _submit_fn(
+        self,
+        message: str | dict[str, list],
+        history_with_input: list[list[str | tuple | None]],
+        request: Request,
+        *args,
+    ) -> tuple[list[list[str | tuple | None]], list[list[str | tuple | None]]]:
+        if self.multimodal and isinstance(message, dict):
+            remove_input = (
+                len(message["files"]) + 1
+                if message["text"] is not None
+                else len(message["files"])
+            )
+            history = history_with_input[:-remove_input]
+        else:
+            history = history_with_input[:-1]
+        inputs, _, _ = special_args(
+            self.fn, inputs=[message, history, *args], request=request
+        )
+        if self.is_async:
+            response = await self.fn(*inputs)
+        else:
+            response = await anyio.to_thread.run_sync(
+                self.fn, *inputs, limiter=self.limiter
+            )
+        if self.multimodal and isinstance(message, dict):
+            self._append_multimodal_history(message, response, history)
+        elif isinstance(message, str):
+            history.append([message, response])
+        return history, history
+    async def _stream_fn(
+        self,
+        message: str | dict[str, list],
+        history_with_input: list[list[str | tuple | None]],
+        request: Request,
+        *args,
+    ) -> AsyncGenerator:
+        if self.multimodal and isinstance(message, dict):
+            remove_input = (
+                len(message["files"]) + 1
+                if message["text"] is not None
+                else len(message["files"])
+            )
+            history = history_with_input[:-remove_input]
+        else:
+            history = history_with_input[:-1]
+        inputs, _, _ = special_args(
+            self.fn, inputs=[message, history, *args], request=request
+        )
+        if self.is_async:
+            generator = self.fn(*inputs)
+        else:
+            generator = await anyio.to_thread.run_sync(
+                self.fn, *inputs, limiter=self.limiter
+            )
+            generator = SyncToAsyncIterator(generator, self.limiter)
+        try:
+            first_response = await async_iteration(generator)
+            if self.multimodal and isinstance(message, dict):
+                for x in message["files"]:
+                    history.append([(x,), None])
+                update = history + [[message["text"], first_response]]
+                yield update, update
+            else:
+                update = history + [[message, first_response]]
+                yield update, update
+        except StopIteration:
+            if self.multimodal and isinstance(message, dict):
+                self._append_multimodal_history(message, None, history)
+                yield history, history
+            else:
+                update = history + [[message, None]]
+                yield update, update
+        async for response in generator:
+            if self.multimodal and isinstance(message, dict):
+                update = history + [[message["text"], response]]
+                yield update, update
+            else:
+                update = history + [[message, response]]
+                yield update, update
+    async def _api_submit_fn(
+        self, message: str, history: list[list[str | None]], request: Request, *args
+    ) -> tuple[str, list[list[str | None]]]:
+        inputs, _, _ = special_args(
+            self.fn, inputs=[message, history, *args], request=request
+        )
+        if self.is_async:
+            response = await self.fn(*inputs)
+        else:
+            response = await anyio.to_thread.run_sync(
+                self.fn, *inputs, limiter=self.limiter
+            )
+        history.append([message, response])
+        return response, history
+    async def _api_stream_fn(
+        self, message: str, history: list[list[str | None]], request: Request, *args
+    ) -> AsyncGenerator:
+        inputs, _, _ = special_args(
+            self.fn, inputs=[message, history, *args], request=request
+        )
+        if self.is_async:
+            generator = self.fn(*inputs)
+        else:
+            generator = await anyio.to_thread.run_sync(
+                self.fn, *inputs, limiter=self.limiter
+            )
+            generator = SyncToAsyncIterator(generator, self.limiter)
+        try:
+            first_response = await async_iteration(generator)
+            yield first_response, history + [[message, first_response]]
+        except StopIteration:
+            yield None, history + [[message, None]]
+        async for response in generator:
+            yield response, history + [[message, response]]
+    async def _delete_prev_fn(
+        self,
+        message: str | dict[str, list],
+        history: list[list[str | tuple | None]],
+    ) -> tuple[
+        list[list[str | tuple | None]],
+        str | dict[str, list],
+        list[list[str | tuple | None]],
+    ]:
+        if self.multimodal and isinstance(message, dict):
+            remove_input = (
+                len(message["files"]) + 1
+                if message["text"] is not None
+                else len(message["files"])
+            )
+            history = history[:-remove_input]
+        else:
+            history = history[:-1]
+        return history, message or "", history

lib_omost/canvas.py ADDED Viewed

	@@ -0,0 +1,248 @@

+import re
+import difflib
+import numpy as np
+system_prompt = r'''You are a helpful AI assistant to compose images using the below python class `Canvas`:
+```python
+class Canvas:
+    def set_global_description(self, description: str, detailed_descriptions: list[str], tags: str, HTML_web_color_name: str):
+        pass
+    def add_local_description(self, location: str, offset: str, area: str, distance_to_viewer: float, description: str, detailed_descriptions: list[str], tags: str, atmosphere: str, style: str, quality_meta: str, HTML_web_color_name: str):
+        assert location in ["in the center", "on the left", "on the right", "on the top", "on the bottom", "on the top-left", "on the top-right", "on the bottom-left", "on the bottom-right"]
+        assert offset in ["no offset", "slightly to the left", "slightly to the right", "slightly to the upper", "slightly to the lower", "slightly to the upper-left", "slightly to the upper-right", "slightly to the lower-left", "slightly to the lower-right"]
+        assert area in ["a small square area", "a small vertical area", "a small horizontal area", "a medium-sized square area", "a medium-sized vertical area", "a medium-sized horizontal area", "a large square area", "a large vertical area", "a large horizontal area"]
+        assert distance_to_viewer > 0
+        pass
+```'''
+valid_colors = {  # r, g, b
+    'aliceblue': (240, 248, 255), 'antiquewhite': (250, 235, 215), 'aqua': (0, 255, 255),
+    'aquamarine': (127, 255, 212), 'azure': (240, 255, 255), 'beige': (245, 245, 220),
+    'bisque': (255, 228, 196), 'black': (0, 0, 0), 'blanchedalmond': (255, 235, 205), 'blue': (0, 0, 255),
+    'blueviolet': (138, 43, 226), 'brown': (165, 42, 42), 'burlywood': (222, 184, 135),
+    'cadetblue': (95, 158, 160), 'chartreuse': (127, 255, 0), 'chocolate': (210, 105, 30),
+    'coral': (255, 127, 80), 'cornflowerblue': (100, 149, 237), 'cornsilk': (255, 248, 220),
+    'crimson': (220, 20, 60), 'cyan': (0, 255, 255), 'darkblue': (0, 0, 139), 'darkcyan': (0, 139, 139),
+    'darkgoldenrod': (184, 134, 11), 'darkgray': (169, 169, 169), 'darkgrey': (169, 169, 169),
+    'darkgreen': (0, 100, 0), 'darkkhaki': (189, 183, 107), 'darkmagenta': (139, 0, 139),
+    'darkolivegreen': (85, 107, 47), 'darkorange': (255, 140, 0), 'darkorchid': (153, 50, 204),
+    'darkred': (139, 0, 0), 'darksalmon': (233, 150, 122), 'darkseagreen': (143, 188, 143),
+    'darkslateblue': (72, 61, 139), 'darkslategray': (47, 79, 79), 'darkslategrey': (47, 79, 79),
+    'darkturquoise': (0, 206, 209), 'darkviolet': (148, 0, 211), 'deeppink': (255, 20, 147),
+    'deepskyblue': (0, 191, 255), 'dimgray': (105, 105, 105), 'dimgrey': (105, 105, 105),
+    'dodgerblue': (30, 144, 255), 'firebrick': (178, 34, 34), 'floralwhite': (255, 250, 240),
+    'forestgreen': (34, 139, 34), 'fuchsia': (255, 0, 255), 'gainsboro': (220, 220, 220),
+    'ghostwhite': (248, 248, 255), 'gold': (255, 215, 0), 'goldenrod': (218, 165, 32),
+    'gray': (128, 128, 128), 'grey': (128, 128, 128), 'green': (0, 128, 0), 'greenyellow': (173, 255, 47),
+    'honeydew': (240, 255, 240), 'hotpink': (255, 105, 180), 'indianred': (205, 92, 92),
+    'indigo': (75, 0, 130), 'ivory': (255, 255, 240), 'khaki': (240, 230, 140), 'lavender': (230, 230, 250),
+    'lavenderblush': (255, 240, 245), 'lawngreen': (124, 252, 0), 'lemonchiffon': (255, 250, 205),
+    'lightblue': (173, 216, 230), 'lightcoral': (240, 128, 128), 'lightcyan': (224, 255, 255),
+    'lightgoldenrodyellow': (250, 250, 210), 'lightgray': (211, 211, 211), 'lightgrey': (211, 211, 211),
+    'lightgreen': (144, 238, 144), 'lightpink': (255, 182, 193), 'lightsalmon': (255, 160, 122),
+    'lightseagreen': (32, 178, 170), 'lightskyblue': (135, 206, 250), 'lightslategray': (119, 136, 153),
+    'lightslategrey': (119, 136, 153), 'lightsteelblue': (176, 196, 222), 'lightyellow': (255, 255, 224),
+    'lime': (0, 255, 0), 'limegreen': (50, 205, 50), 'linen': (250, 240, 230), 'magenta': (255, 0, 255),
+    'maroon': (128, 0, 0), 'mediumaquamarine': (102, 205, 170), 'mediumblue': (0, 0, 205),
+    'mediumorchid': (186, 85, 211), 'mediumpurple': (147, 112, 219), 'mediumseagreen': (60, 179, 113),
+    'mediumslateblue': (123, 104, 238), 'mediumspringgreen': (0, 250, 154),
+    'mediumturquoise': (72, 209, 204), 'mediumvioletred': (199, 21, 133), 'midnightblue': (25, 25, 112),
+    'mintcream': (245, 255, 250), 'mistyrose': (255, 228, 225), 'moccasin': (255, 228, 181),
+    'navajowhite': (255, 222, 173), 'navy': (0, 0, 128), 'navyblue': (0, 0, 128),
+    'oldlace': (253, 245, 230), 'olive': (128, 128, 0), 'olivedrab': (107, 142, 35),
+    'orange': (255, 165, 0), 'orangered': (255, 69, 0), 'orchid': (218, 112, 214),
+    'palegoldenrod': (238, 232, 170), 'palegreen': (152, 251, 152), 'paleturquoise': (175, 238, 238),
+    'palevioletred': (219, 112, 147), 'papayawhip': (255, 239, 213), 'peachpuff': (255, 218, 185),
+    'peru': (205, 133, 63), 'pink': (255, 192, 203), 'plum': (221, 160, 221), 'powderblue': (176, 224, 230),
+    'purple': (128, 0, 128), 'rebeccapurple': (102, 51, 153), 'red': (255, 0, 0),
+    'rosybrown': (188, 143, 143), 'royalblue': (65, 105, 225), 'saddlebrown': (139, 69, 19),
+    'salmon': (250, 128, 114), 'sandybrown': (244, 164, 96), 'seagreen': (46, 139, 87),
+    'seashell': (255, 245, 238), 'sienna': (160, 82, 45), 'silver': (192, 192, 192),
+    'skyblue': (135, 206, 235), 'slateblue': (106, 90, 205), 'slategray': (112, 128, 144),
+    'slategrey': (112, 128, 144), 'snow': (255, 250, 250), 'springgreen': (0, 255, 127),
+    'steelblue': (70, 130, 180), 'tan': (210, 180, 140), 'teal': (0, 128, 128), 'thistle': (216, 191, 216),
+    'tomato': (255, 99, 71), 'turquoise': (64, 224, 208), 'violet': (238, 130, 238),
+    'wheat': (245, 222, 179), 'white': (255, 255, 255), 'whitesmoke': (245, 245, 245),
+    'yellow': (255, 255, 0), 'yellowgreen': (154, 205, 50)
+}
+valid_locations = {  # x, y in 90*90
+    'in the center': (45, 45),
+    'on the left': (15, 45),
+    'on the right': (75, 45),
+    'on the top': (45, 15),
+    'on the bottom': (45, 75),
+    'on the top-left': (15, 15),
+    'on the top-right': (75, 15),
+    'on the bottom-left': (15, 75),
+    'on the bottom-right': (75, 75)
+}
+valid_offsets = {  # x, y in 90*90
+    'no offset': (0, 0),
+    'slightly to the left': (-10, 0),
+    'slightly to the right': (10, 0),
+    'slightly to the upper': (0, -10),
+    'slightly to the lower': (0, 10),
+    'slightly to the upper-left': (-10, -10),
+    'slightly to the upper-right': (10, -10),
+    'slightly to the lower-left': (-10, 10),
+    'slightly to the lower-right': (10, 10)}
+valid_areas = {  # w, h in 90*90
+    "a small square area": (50, 50),
+    "a small vertical area": (40, 60),
+    "a small horizontal area": (60, 40),
+    "a medium-sized square area": (60, 60),
+    "a medium-sized vertical area": (50, 80),
+    "a medium-sized horizontal area": (80, 50),
+    "a large square area": (70, 70),
+    "a large vertical area": (60, 90),
+    "a large horizontal area": (90, 60)
+}
+def closest_name(input_str, options):
+    input_str = input_str.lower()
+    closest_match = difflib.get_close_matches(input_str, list(options.keys()), n=1, cutoff=0.5)
+    assert isinstance(closest_match, list) and len(closest_match) > 0, f'The value [{input_str}] is not valid!'
+    result = closest_match[0]
+    if result != input_str:
+        print(f'Automatically corrected [{input_str}] -> [{result}].')
+    return result
+def safe_str(x):
+    return x.strip(',. ') + '.'
+def binary_nonzero_positions(n, offset=0):
+    binary_str = bin(n)[2:]
+    positions = [i + offset for i, bit in enumerate(reversed(binary_str)) if bit == '1']
+    return positions
+class Canvas:
+    @staticmethod
+    def from_bot_response(response: str):
+        matched = re.search(r'```python\n(.*?)\n```', response, re.DOTALL)
+        assert matched, 'Response does not contain codes!'
+        code_content = matched.group(1)
+        assert 'canvas = Canvas()' in code_content, 'Code block must include valid canvas var!'
+        local_vars = {'Canvas': Canvas}
+        exec(code_content, {}, local_vars)
+        canvas = local_vars.get('canvas', None)
+        assert isinstance(canvas, Canvas), 'Code block must produce valid canvas var!'
+        return canvas
+    def __init__(self):
+        self.components = []
+        self.color = None
+        self.record_tags = True
+        self.prefixes = []
+        self.suffixes = []
+        return
+    def set_global_description(self, description: str, detailed_descriptions: list[str], tags: str,
+                               HTML_web_color_name: str):
+        assert isinstance(description, str), 'Global description is not valid!'
+        assert isinstance(detailed_descriptions, list) and all(isinstance(item, str) for item in detailed_descriptions), \
+            'Global detailed_descriptions is not valid!'
+        assert isinstance(tags, str), 'Global tags is not valid!'
+        HTML_web_color_name = closest_name(HTML_web_color_name, valid_colors)
+        self.color = np.array([[valid_colors[HTML_web_color_name]]], dtype=np.uint8)
+        self.prefixes = [description]
+        self.suffixes = detailed_descriptions
+        if self.record_tags:
+            self.suffixes = self.suffixes + [tags]
+        self.prefixes = [safe_str(x) for x in self.prefixes]
+        self.suffixes = [safe_str(x) for x in self.suffixes]
+        return
+    def add_local_description(self, location: str, offset: str, area: str, distance_to_viewer: float, description: str,
+                              detailed_descriptions: list[str], tags: str, atmosphere: str, style: str,
+                              quality_meta: str, HTML_web_color_name: str):
+        assert isinstance(description, str), 'Local description is wrong!'
+        assert isinstance(distance_to_viewer, (int, float)) and distance_to_viewer > 0, \
+            f'The distance_to_viewer for [{description}] is not positive float number!'
+        assert isinstance(detailed_descriptions, list) and all(isinstance(item, str) for item in detailed_descriptions), \
+            f'The detailed_descriptions for [{description}] is not valid!'
+        assert isinstance(tags, str), f'The tags for [{description}] is not valid!'
+        assert isinstance(atmosphere, str), f'The atmosphere for [{description}] is not valid!'
+        assert isinstance(style, str), f'The style for [{description}] is not valid!'
+        assert isinstance(quality_meta, str), f'The quality_meta for [{description}] is not valid!'
+        location = closest_name(location, valid_locations)
+        offset = closest_name(offset, valid_offsets)
+        area = closest_name(area, valid_areas)
+        HTML_web_color_name = closest_name(HTML_web_color_name, valid_colors)
+        xb, yb = valid_locations[location]
+        xo, yo = valid_offsets[offset]
+        w, h = valid_areas[area]
+        rect = (yb + yo - h // 2, yb + yo + h // 2, xb + xo - w // 2, xb + xo + w // 2)
+        rect = [max(0, min(90, i)) for i in rect]
+        color = np.array([[valid_colors[HTML_web_color_name]]], dtype=np.uint8)
+        prefixes = self.prefixes + [description]
+        suffixes = detailed_descriptions
+        if self.record_tags:
+            suffixes = suffixes + [tags, atmosphere, style, quality_meta]
+        prefixes = [safe_str(x) for x in prefixes]
+        suffixes = [safe_str(x) for x in suffixes]
+        self.components.append(dict(
+            rect=rect,
+            distance_to_viewer=distance_to_viewer,
+            color=color,
+            prefixes=prefixes,
+            suffixes=suffixes
+        ))
+        return
+    def process(self):
+        # sort components
+        self.components = sorted(self.components, key=lambda x: x['distance_to_viewer'], reverse=True)
+        # compute initial latent
+        initial_latent = np.zeros(shape=(90, 90, 3), dtype=np.float32) + self.color
+        for component in self.components:
+            a, b, c, d = component['rect']
+            initial_latent[a:b, c:d] = 0.7 * component['color'] + 0.3 * initial_latent[a:b, c:d]
+        initial_latent = initial_latent.clip(0, 255).astype(np.uint8)
+        # compute conditions
+        bag_of_conditions = [
+            dict(mask=np.ones(shape=(90, 90), dtype=np.float32), prefixes=self.prefixes, suffixes=self.suffixes)
+        ]
+        for i, component in enumerate(self.components):
+            a, b, c, d = component['rect']
+            m = np.zeros(shape=(90, 90), dtype=np.float32)
+            m[a:b, c:d] = 1.0
+            bag_of_conditions.append(dict(
+                mask=m,
+                prefixes=component['prefixes'],
+                suffixes=component['suffixes']
+            ))
+        return dict(
+            initial_latent=initial_latent,
+            bag_of_conditions=bag_of_conditions,
+        )

lib_omost/pipeline.py ADDED Viewed

	@@ -0,0 +1,435 @@

+import numpy as np
+import copy
+from tqdm.auto import trange
+from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl_img2img import *
+from diffusers.models.transformers import Transformer2DModel
+original_Transformer2DModel_forward = Transformer2DModel.forward
+def hacked_Transformer2DModel_forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        timestep: Optional[torch.LongTensor] = None,
+        added_cond_kwargs: Dict[str, torch.Tensor] = None,
+        class_labels: Optional[torch.LongTensor] = None,
+        cross_attention_kwargs: Dict[str, Any] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        return_dict: bool = True,
+):
+    cross_attention_kwargs = cross_attention_kwargs or {}
+    cross_attention_kwargs['hidden_states_original_shape'] = hidden_states.shape
+    return original_Transformer2DModel_forward(
+        self, hidden_states, encoder_hidden_states, timestep, added_cond_kwargs, class_labels, cross_attention_kwargs,
+        attention_mask, encoder_attention_mask, return_dict)
+Transformer2DModel.forward = hacked_Transformer2DModel_forward
+@torch.no_grad()
+def sample_dpmpp_2m(model, x, sigmas, extra_args=None, callback=None, disable=None):
+    """DPM-Solver++(2M)."""
+    extra_args = {} if extra_args is None else extra_args
+    s_in = x.new_ones([x.shape[0]])
+    sigma_fn = lambda t: t.neg().exp()
+    t_fn = lambda sigma: sigma.log().neg()
+    old_denoised = None
+    for i in trange(len(sigmas) - 1, disable=disable):
+        denoised = model(x, sigmas[i] * s_in, **extra_args)
+        if callback is not None:
+            callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
+        t, t_next = t_fn(sigmas[i]), t_fn(sigmas[i + 1])
+        h = t_next - t
+        if old_denoised is None or sigmas[i + 1] == 0:
+            x = (sigma_fn(t_next) / sigma_fn(t)) * x - (-h).expm1() * denoised
+        else:
+            h_last = t - t_fn(sigmas[i - 1])
+            r = h_last / h
+            denoised_d = (1 + 1 / (2 * r)) * denoised - (1 / (2 * r)) * old_denoised
+            x = (sigma_fn(t_next) / sigma_fn(t)) * x - (-h).expm1() * denoised_d
+        old_denoised = denoised
+    return x
+class KModel:
+    def __init__(self, unet, timesteps=1000, linear_start=0.00085, linear_end=0.012):
+        betas = torch.linspace(linear_start ** 0.5, linear_end ** 0.5, timesteps, dtype=torch.float64) ** 2
+        alphas = 1. - betas
+        alphas_cumprod = torch.tensor(np.cumprod(alphas, axis=0), dtype=torch.float32)
+        self.sigmas = ((1 - alphas_cumprod) / alphas_cumprod) ** 0.5
+        self.log_sigmas = self.sigmas.log()
+        self.sigma_data = 1.0
+        self.unet = unet
+        return
+    @property
+    def sigma_min(self):
+        return self.sigmas[0]
+    @property
+    def sigma_max(self):
+        return self.sigmas[-1]
+    def timestep(self, sigma):
+        log_sigma = sigma.log()
+        dists = log_sigma.to(self.log_sigmas.device) - self.log_sigmas[:, None]
+        return dists.abs().argmin(dim=0).view(sigma.shape).to(sigma.device)
+    def get_sigmas_karras(self, n, rho=7.):
+        ramp = torch.linspace(0, 1, n)
+        min_inv_rho = self.sigma_min ** (1 / rho)
+        max_inv_rho = self.sigma_max ** (1 / rho)
+        sigmas = (max_inv_rho + ramp * (min_inv_rho - max_inv_rho)) ** rho
+        return torch.cat([sigmas, sigmas.new_zeros([1])])
+    def __call__(self, x, sigma, **extra_args):
+        x_ddim_space = x / (sigma[:, None, None, None] ** 2 + self.sigma_data ** 2) ** 0.5
+        t = self.timestep(sigma)
+        cfg_scale = extra_args['cfg_scale']
+        eps_positive = self.unet(x_ddim_space, t, return_dict=False, **extra_args['positive'])[0]
+        eps_negative = self.unet(x_ddim_space, t, return_dict=False, **extra_args['negative'])[0]
+        noise_pred = eps_negative + cfg_scale * (eps_positive - eps_negative)
+        return x - noise_pred * sigma[:, None, None, None]
+class OmostSelfAttnProcessor:
+    def __call__(self, attn, hidden_states, encoder_hidden_states, hidden_states_original_shape, *args, **kwargs):
+        batch_size, sequence_length, _ = hidden_states.shape
+        query = attn.to_q(hidden_states)
+        key = attn.to_k(hidden_states)
+        value = attn.to_v(hidden_states)
+        inner_dim = key.shape[-1]
+        head_dim = inner_dim // attn.heads
+        query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        hidden_states = torch.nn.functional.scaled_dot_product_attention(
+            query, key, value, attn_mask=None, dropout_p=0.0, is_causal=False
+        )
+        hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
+        hidden_states = hidden_states.to(query.dtype)
+        hidden_states = attn.to_out[0](hidden_states)
+        hidden_states = attn.to_out[1](hidden_states)
+        return hidden_states
+class OmostCrossAttnProcessor:
+    def __call__(self, attn, hidden_states, encoder_hidden_states, hidden_states_original_shape, *args, **kwargs):
+        B, C, H, W = hidden_states_original_shape
+        conds = []
+        masks = []
+        for m, c in encoder_hidden_states:
+            m = torch.nn.functional.interpolate(m[None, None, :, :], (H, W), mode='nearest-exact').flatten().unsqueeze(1).repeat(1, c.size(1))
+            conds.append(c)
+            masks.append(m)
+        conds = torch.cat(conds, dim=1)
+        masks = torch.cat(masks, dim=1)
+        mask_bool = masks > 0.5
+        mask_scale = (H * W) / torch.sum(masks, dim=0, keepdim=True)
+        batch_size, sequence_length, _ = conds.shape
+        query = attn.to_q(hidden_states)
+        key = attn.to_k(conds)
+        value = attn.to_v(conds)
+        inner_dim = key.shape[-1]
+        head_dim = inner_dim // attn.heads
+        query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+        mask_bool = mask_bool[None, None, :, :].repeat(query.size(0), query.size(1), 1, 1)
+        mask_scale = mask_scale[None, None, :, :].repeat(query.size(0), query.size(1), 1, 1)
+        sim = query @ key.transpose(-2, -1) * attn.scale
+        sim = sim * mask_scale.to(sim)
+        sim.masked_fill_(mask_bool.logical_not(), float("-inf"))
+        sim = sim.softmax(dim=-1)
+        h = sim @ value
+        h = h.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
+        h = attn.to_out[0](h)
+        h = attn.to_out[1](h)
+        return h
+class StableDiffusionXLOmostPipeline(StableDiffusionXLImg2ImgPipeline):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.k_model = KModel(unet=self.unet)
+        attn_procs = {}
+        for name in self.unet.attn_processors.keys():
+            if name.endswith("attn2.processor"):
+                attn_procs[name] = OmostCrossAttnProcessor()
+            else:
+                attn_procs[name] = OmostSelfAttnProcessor()
+        self.unet.set_attn_processor(attn_procs)
+        return
+    @torch.inference_mode()
+    def encode_bag_of_subprompts_greedy(self, prefixes: list[str], suffixes: list[str]):
+        device = self.text_encoder.device
+        @torch.inference_mode()
+        def greedy_partition(items, max_sum):
+            bags = []
+            current_bag = []
+            current_sum = 0
+            for item in items:
+                num = item['length']
+                if current_sum + num > max_sum:
+                    if current_bag:
+                        bags.append(current_bag)
+                    current_bag = [item]
+                    current_sum = num
+                else:
+                    current_bag.append(item)
+                    current_sum += num
+            if current_bag:
+                bags.append(current_bag)
+            return bags
+        @torch.inference_mode()
+        def get_77_tokens_in_torch(subprompt_inds, tokenizer):
+            # Note that all subprompt are theoretically less than 75 tokens (without bos/eos)
+            result = [tokenizer.bos_token_id] + subprompt_inds[:75] + [tokenizer.eos_token_id] + [tokenizer.pad_token_id] * 75
+            result = result[:77]
+            result = torch.tensor([result]).to(device=device, dtype=torch.int64)
+            return result
+        @torch.inference_mode()
+        def merge_with_prefix(bag):
+            merged_ids_t1 = copy.deepcopy(prefix_ids_t1)
+            merged_ids_t2 = copy.deepcopy(prefix_ids_t2)
+            for item in bag:
+                merged_ids_t1.extend(item['ids_t1'])
+                merged_ids_t2.extend(item['ids_t2'])
+            return dict(
+                ids_t1=get_77_tokens_in_torch(merged_ids_t1, self.tokenizer),
+                ids_t2=get_77_tokens_in_torch(merged_ids_t2, self.tokenizer_2)
+            )
+        @torch.inference_mode()
+        def double_encode(pair_of_inds):
+            inds = [pair_of_inds['ids_t1'], pair_of_inds['ids_t2']]
+            text_encoders = [self.text_encoder, self.text_encoder_2]
+            pooled_prompt_embeds = None
+            prompt_embeds_list = []
+            for text_input_ids, text_encoder in zip(inds, text_encoders):
+                prompt_embeds = text_encoder(text_input_ids, output_hidden_states=True)
+                # Only last pooler_output is needed
+                pooled_prompt_embeds = prompt_embeds.pooler_output
+                # "2" because SDXL always indexes from the penultimate layer.
+                prompt_embeds = prompt_embeds.hidden_states[-2]
+                prompt_embeds_list.append(prompt_embeds)
+            prompt_embeds = torch.concat(prompt_embeds_list, dim=-1)
+            return prompt_embeds, pooled_prompt_embeds
+        # Begin with tokenizing prefixes
+        prefix_length = 0
+        prefix_ids_t1 = []
+        prefix_ids_t2 = []
+        for prefix in prefixes:
+            ids_t1 = self.tokenizer(prefix, truncation=False, add_special_tokens=False).input_ids
+            ids_t2 = self.tokenizer_2(prefix, truncation=False, add_special_tokens=False).input_ids
+            assert len(ids_t1) == len(ids_t2)
+            prefix_length += len(ids_t1)
+            prefix_ids_t1 += ids_t1
+            prefix_ids_t2 += ids_t2
+        # Then tokenizing suffixes
+        allowed_suffix_length = 75 - prefix_length
+        suffix_targets = []
+        for subprompt in suffixes:
+            # Note that all subprompt are theoretically less than 75 tokens (without bos/eos)
+            # So we can safely just crop it to 75
+            ids_t1 = self.tokenizer(subprompt, truncation=False, add_special_tokens=False).input_ids[:75]
+            ids_t2 = self.tokenizer_2(subprompt, truncation=False, add_special_tokens=False).input_ids[:75]
+            assert len(ids_t1) == len(ids_t2)
+            suffix_targets.append(dict(
+                length=len(ids_t1),
+                ids_t1=ids_t1,
+                ids_t2=ids_t2
+            ))
+        # Then merge prefix and suffix tokens
+        suffix_targets = greedy_partition(suffix_targets, max_sum=allowed_suffix_length)
+        targets = [merge_with_prefix(b) for b in suffix_targets]
+        # Encode!
+        conds, poolers = [], []
+        for target in targets:
+            cond, pooler = double_encode(target)
+            conds.append(cond)
+            poolers.append(pooler)
+        conds_merged = torch.concat(conds, dim=1)
+        poolers_merged = poolers[0]
+        return dict(cond=conds_merged, pooler=poolers_merged)
+    @torch.inference_mode()
+    def all_conds_from_canvas(self, canvas_outputs, negative_prompt):
+        mask_all = torch.ones(size=(90, 90), dtype=torch.float32)
+        negative_cond, negative_pooler = self.encode_cropped_prompt_77tokens(negative_prompt)
+        negative_result = [(mask_all, negative_cond)]
+        positive_result = []
+        positive_pooler = None
+        for item in canvas_outputs['bag_of_conditions']:
+            current_mask = torch.from_numpy(item['mask']).to(torch.float32)
+            current_prefixes = item['prefixes']
+            current_suffixes = item['suffixes']
+            current_cond = self.encode_bag_of_subprompts_greedy(prefixes=current_prefixes, suffixes=current_suffixes)
+            if positive_pooler is None:
+                positive_pooler = current_cond['pooler']
+            positive_result.append((current_mask, current_cond['cond']))
+        return positive_result, positive_pooler, negative_result, negative_pooler
+    @torch.inference_mode()
+    def encode_cropped_prompt_77tokens(self, prompt: str):
+        device = self.text_encoder.device
+        tokenizers = [self.tokenizer, self.tokenizer_2]
+        text_encoders = [self.text_encoder, self.text_encoder_2]
+        pooled_prompt_embeds = None
+        prompt_embeds_list = []
+        for tokenizer, text_encoder in zip(tokenizers, text_encoders):
+            text_input_ids = tokenizer(
+                prompt,
+                padding="max_length",
+                max_length=tokenizer.model_max_length,
+                truncation=True,
+                return_tensors="pt",
+            ).input_ids
+            prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
+            # Only last pooler_output is needed
+            pooled_prompt_embeds = prompt_embeds.pooler_output
+            # "2" because SDXL always indexes from the penultimate layer.
+            prompt_embeds = prompt_embeds.hidden_states[-2]
+            prompt_embeds_list.append(prompt_embeds)
+        prompt_embeds = torch.concat(prompt_embeds_list, dim=-1)
+        prompt_embeds = prompt_embeds.to(dtype=self.unet.dtype, device=device)
+        return prompt_embeds, pooled_prompt_embeds
+    @torch.inference_mode()
+    def __call__(
+            self,
+            initial_latent: torch.FloatTensor = None,
+            strength: float = 1.0,
+            num_inference_steps: int = 25,
+            guidance_scale: float = 5.0,
+            batch_size: Optional[int] = 1,
+            generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
+            prompt_embeds: Optional[torch.FloatTensor] = None,
+            negative_prompt_embeds: Optional[torch.FloatTensor] = None,
+            pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
+            negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
+            cross_attention_kwargs: Optional[dict] = None,
+    ):
+        device = self.unet.device
+        cross_attention_kwargs = cross_attention_kwargs or {}
+        # Sigmas
+        sigmas = self.k_model.get_sigmas_karras(int(num_inference_steps / strength))
+        sigmas = sigmas[-(num_inference_steps + 1):].to(device)
+        # Initial latents
+        _, C, H, W = initial_latent.shape
+        noise = randn_tensor((batch_size, C, H, W), generator=generator, device=device, dtype=self.unet.dtype)
+        latents = initial_latent.to(noise) + noise * sigmas[0].to(noise)
+        # Shape
+        height, width = latents.shape[-2:]
+        height = height * self.vae_scale_factor
+        width = width * self.vae_scale_factor
+        add_time_ids = list((height, width) + (0, 0) + (height, width))
+        add_time_ids = torch.tensor([add_time_ids], dtype=self.unet.dtype)
+        add_neg_time_ids = add_time_ids.clone()
+        # Batch
+        latents = latents.to(device)
+        add_time_ids = add_time_ids.repeat(batch_size, 1).to(device)
+        add_neg_time_ids = add_neg_time_ids.repeat(batch_size, 1).to(device)
+        prompt_embeds = [(k.to(device), v.repeat(batch_size, 1, 1).to(noise)) for k, v in prompt_embeds]
+        negative_prompt_embeds = [(k.to(device), v.repeat(batch_size, 1, 1).to(noise)) for k, v in negative_prompt_embeds]
+        pooled_prompt_embeds = pooled_prompt_embeds.repeat(batch_size, 1).to(noise)
+        negative_pooled_prompt_embeds = negative_pooled_prompt_embeds.repeat(batch_size, 1).to(noise)
+        # Feeds
+        sampler_kwargs = dict(
+            cfg_scale=guidance_scale,
+            positive=dict(
+                encoder_hidden_states=prompt_embeds,
+                added_cond_kwargs={"text_embeds": pooled_prompt_embeds, "time_ids": add_time_ids},
+                cross_attention_kwargs=cross_attention_kwargs
+            ),
+            negative=dict(
+                encoder_hidden_states=negative_prompt_embeds,
+                added_cond_kwargs={"text_embeds": negative_pooled_prompt_embeds, "time_ids": add_neg_time_ids},
+                cross_attention_kwargs=cross_attention_kwargs
+            )
+        )
+        # Sample
+        results = sample_dpmpp_2m(self.k_model, latents, sigmas, extra_args=sampler_kwargs, disable=False)
+        return StableDiffusionXLPipelineOutput(images=results)