Spaces:

toshas
/

marigold-iid-private

Sleeping

App Files Files Community

KevinQu7 commited on 18 days ago

Commit

09c3706

•

1 Parent(s): 641fe65

initial commit

Browse files

Files changed (7) hide show

.gitignore +7 -0
LICENSE.txt +177 -0
app.py +639 -0
marigold_iid_appearance.py +544 -0
marigold_iid_residual.py +552 -0
requirements.txt +126 -0
requirements_min.txt +16 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+.idea
+.DS_Store
+__pycache__
+gradio_cached_examples
+Marigold
+*.sh
+script/

LICENSE.txt ADDED Viewed

	@@ -0,0 +1,177 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS

app.py ADDED Viewed

	@@ -0,0 +1,639 @@

+# Copyright 2024 Anton Obukhov, ETH Zurich. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# --------------------------------------------------------------------------
+# If you find this code useful, we kindly ask you to cite our paper in your work.
+# Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
+# More information about the method can be found at https://marigoldmonodepth.github.io
+# --------------------------------------------------------------------------
+from __future__ import annotations
+import functools
+import os
+import tempfile
+import warnings
+import spaces
+import gradio as gr
+import numpy as np
+import torch as torch
+from PIL import Image
+from diffusers import UNet2DConditionModel
+from gradio_imageslider import ImageSlider
+from huggingface_hub import login
+from gradio_patches.examples import Examples
+from gradio_patches.flagging import HuggingFaceDatasetSaver, FlagMethod
+from marigold_iid_appearance import MarigoldIIDAppearancePipeline
+from marigold_iid_residual import MarigoldIIDResidualPipeline
+warnings.filterwarnings(
+    "ignore", message=".*LoginButton created outside of a Blocks context.*"
+)
+default_seed = 2024
+default_image_denoise_steps = 4
+default_image_ensemble_size = 1
+default_image_processing_res = 768
+default_image_reproducuble = True
+default_model_type="appearance"
+default_share_always_show_hf_logout_btn = True
+default_share_always_show_accordion = False
+loaded_pipelines = {}  # Cache to store loaded pipelines
+def process_with_loaded_pipeline(image_path, denoise_steps, ensemble_size, processing_res, model_type):
+    # Load and cache the pipeline based on the model type.
+    if model_type not in loaded_pipelines:
+        auth_token = os.environ.get("KEV_TOKEN")
+        if model_type == "appearance":
+            loaded_pipelines[model_type] = MarigoldIIDAppearancePipeline.from_pretrained(
+                "prs-eth/marigold-iid-appearance-v1-1", token=auth_token
+            )
+        elif model_type == "residual":
+            loaded_pipelines[model_type] = MarigoldIIDResidualPipeline.from_pretrained(
+                "prs-eth/marigold-iid-residual-v1-1", token=auth_token
+            )
+        # Move the pipeline to GPU if available
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        loaded_pipelines[model_type] = loaded_pipelines[model_type].to(device)
+    pipe = loaded_pipelines[model_type]
+    # Process the image using the preloaded pipeline.
+    return process_image(
+        pipe=pipe,
+        path_input=image_path,
+        denoise_steps=denoise_steps,
+        ensemble_size=ensemble_size,
+        processing_res=processing_res,
+        model_type=model_type,
+    )
+def process_image_check(path_input):
+    if path_input is None:
+        raise gr.Error(
+            "Missing image in the first pane: upload a file or use one from the gallery below."
+        )
+def process_image(
+    pipe,
+    path_input,
+    denoise_steps=default_image_denoise_steps,
+    ensemble_size=default_image_ensemble_size,
+    processing_res=default_image_processing_res,
+    model_type=default_model_type,
+):
+    name_base, name_ext = os.path.splitext(os.path.basename(path_input))
+    print(f"Processing image {name_base}{name_ext}")
+    path_output_dir = tempfile.mkdtemp()
+    input_image = Image.open(path_input)
+    pipe_out = pipe(
+        input_image,
+        denoising_steps=denoise_steps,
+        ensemble_size=ensemble_size,
+        processing_res=processing_res,
+        batch_size=1 if processing_res == 0 else 0,  # TODO: do we abuse "batch size" notation here?
+        seed=default_seed,
+        show_progress_bar=True,
+    )
+    path_output_dir = os.path.splitext(path_input)[0] + "_output"
+    os.makedirs(path_output_dir, exist_ok=True)
+    path_albedo_out = os.path.join(path_output_dir, f"{name_base}_albedo_fp32.npy")
+    path_albedo_out_vis = os.path.join(path_output_dir, f"{name_base}_albedo.png")
+    albedo = pipe_out.albedo
+    albedo_colored = pipe_out.albedo_colored
+    np.save(path_albedo_out, albedo)
+    albedo_colored.save(path_albedo_out_vis)
+    if model_type == "appearance":
+        path_material_out = os.path.join(path_output_dir, f"{name_base}_material_fp32.npy")
+        path_material_out_vis = os.path.join(path_output_dir, f"{name_base}_material.png")
+        material = pipe_out.material
+        material_colored = pipe_out.material_colored
+        np.save(path_material_out, material)
+        material_colored.save(path_material_out_vis)
+        return (
+            [path_input, path_albedo_out_vis],
+            [path_input, path_material_out_vis],
+            None,
+            [path_albedo_out_vis, path_material_out_vis, path_albedo_out, path_material_out],
+        )
+    elif model_type == "residual":
+        path_shading_out = os.path.join(path_output_dir, f"{name_base}_shading_fp32.npy")
+        path_shading_out_vis = os.path.join(path_output_dir, f"{name_base}_shading.png")
+        path_residual_out = os.path.join(path_output_dir, f"{name_base}_residual_fp32.npy")
+        path_residual_out_vis = os.path.join(path_output_dir, f"{name_base}_residual.png")
+        shading = pipe_out.shading
+        shading_colored = pipe_out.shading_colored
+        residual = pipe_out.residual
+        residual_colored = pipe_out.residual_colored
+        np.save(path_shading_out, shading)
+        shading_colored.save(path_shading_out_vis)
+        np.save(path_residual_out, residual)
+        residual_colored.save(path_residual_out_vis)
+        return (
+            [path_input, path_albedo_out_vis],
+            [path_input, path_shading_out_vis],
+            [path_input, path_residual_out_vis],
+            [path_albedo_out_vis, path_shading_out_vis, path_residual_out_vis, path_albedo_out, path_shading_out, path_residual_out],
+        )
+def run_demo_server(hf_writer=None):
+    process_pipe_image = spaces.GPU(functools.partial(process_with_loaded_pipeline), duration=120)
+    gradio_theme = gr.themes.Default()
+    with gr.Blocks(
+        theme=gradio_theme,
+        title="Marigold Intrinsic Image Decomposition (Marigold-IID)",
+        css="""
+            #download {
+                height: 118px;
+            }
+            .slider .inner {
+                width: 5px;
+                background: #FFF;
+            }
+            .viewport {
+                aspect-ratio: 4/3;
+            }
+            .tabs button.selected {
+                font-size: 20px !important;
+                color: crimson !important;
+            }
+            h1 {
+                text-align: center;
+                display: block;
+            }
+            h2 {
+                text-align: center;
+                display: block;
+            }
+            h3 {
+                text-align: center;
+                display: block;
+            }
+            .md_feedback li {
+                margin-bottom: 0px !important;
+            }
+        """,
+        head="""
+            <script async src="https://www.googletagmanager.com/gtag/js?id=G-1FWSVCGZTG"></script>
+            <script>
+                window.dataLayer = window.dataLayer || [];
+                function gtag() {dataLayer.push(arguments);}
+                gtag('js', new Date());
+                gtag('config', 'G-1FWSVCGZTG');
+            </script>
+        """,
+    ) as demo:
+        if hf_writer is not None:
+            print("Creating login button")
+            share_login_btn = gr.LoginButton(size="sm", scale=1, render=False)
+            print("Created login button")
+            share_login_btn.activate()
+            print("Activated login button")
+        gr.Markdown(
+            """
+            # Marigold Normals Estimation
+            <p align="center">
+            <a title="Website" href="https://marigoldcomputervision.github.io/" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-website.svg">
+            </a>
+            <a title="arXiv" href="https://arxiv.org/abs/2312.02145" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-pdf.svg">
+            </a>
+            <a title="Github" href="https://github.com/prs-eth/marigold" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://img.shields.io/github/stars/prs-eth/marigold?label=GitHub%20%E2%98%85&logo=github&color=C8C" alt="badge-github-stars">
+            </a>
+            <a title="Social" href="https://twitter.com/antonobukhov1" target="_blank" rel="noopener noreferrer" style="display: inline-block;">
+                <img src="https://www.obukhov.ai/img/badges/badge-social.svg" alt="social">
+            </a>
+            </p>
+        """
+        )
+        def get_share_instructions(is_full):
+            out = (
+                "### Help us improve Marigold! If the output is not what you expected, "
+                "you can help us by sharing it with us privately.\n"
+            )
+            if is_full:
+                out += (
+                    "1. Sign into your Hugging Face account using the button below.\n"
+                    "1. Signing in may reset the demo and results; in that case, process the image again.\n"
+                )
+            out += "1. Review and agree to the terms of usage and enter an optional message to us.\n"
+            out += "1. Click the 'Share' button to submit the image to us privately.\n"
+            return out
+        def get_share_conditioned_on_login(profile: gr.OAuthProfile | None):
+            state_logged_out = profile is None
+            return get_share_instructions(is_full=state_logged_out), gr.Button(
+                visible=(state_logged_out or default_share_always_show_hf_logout_btn)
+            )
+        with gr.Row():
+            with gr.Column():
+                image_input = gr.Image(
+                    label="Input Image",
+                    type="filepath",
+                )
+                model_type = gr.Radio(
+                    [
+                        ("Appearance (Albedo & Material)", "appearance"),
+                        ("Residual (Albedo, Shading & Residual)", "residual"),
+                    ],
+                    label="Model Type",
+                    value=default_model_type,
+                )
+                with gr.Accordion("Advanced options", open=True):
+                    image_ensemble_size = gr.Slider(
+                        label="Ensemble size",
+                        minimum=1,
+                        maximum=10,
+                        step=1,
+                        value=default_image_ensemble_size,
+                    )
+                    image_denoise_steps = gr.Slider(
+                        label="Number of denoising steps",
+                        minimum=1,
+                        maximum=20,
+                        step=1,
+                        value=default_image_denoise_steps,
+                    )
+                    image_processing_res = gr.Radio(
+                        [
+                            ("Native", 0),
+                            ("Recommended", 768),
+                        ],
+                        label="Processing resolution",
+                        value=default_image_processing_res,
+                    )
+                with gr.Row():
+                    image_submit_btn = gr.Button(value="Compute Normals", variant="primary")
+                    image_reset_btn = gr.Button(value="Reset")
+            with gr.Column():
+                image_output_slider1 = ImageSlider(
+                    label="Predicted Albedo",
+                    type="filepath",
+                    show_download_button=True,
+                    show_share_button=True,
+                    interactive=False,
+                    elem_classes="slider",
+                    position=0.25,
+                    visible=True
+                )
+                image_output_slider2 = ImageSlider(
+                    label="Predicted Material",
+                    type="filepath",
+                    show_download_button=True,
+                    show_share_button=True,
+                    interactive=False,
+                    elem_classes="slider",
+                    position=0.25,
+                    visible=True
+                )
+                image_output_slider3 = ImageSlider(
+                    label="Predicted Residual",
+                    type="filepath",
+                    show_download_button=True,
+                    show_share_button=True,
+                    interactive=False,
+                    elem_classes="slider",
+                    position=0.25,
+                    visible=False
+                )
+                image_output_files = gr.Files(
+                    label="Output files",
+                    elem_id="download",
+                    interactive=False,
+                )
+                if hf_writer is not None:
+                    with gr.Accordion(
+                            "Feedback",
+                            open=False,
+                            visible=default_share_always_show_accordion,
+                    ) as share_box:
+                        share_instructions = gr.Markdown(
+                            get_share_instructions(is_full=True),
+                            elem_classes="md_feedback",
+                        )
+                        share_transfer_of_rights = gr.Checkbox(
+                            label="(Optional) I own or hold necessary rights to the submitted image. By "
+                                  "checking this box, I grant an irrevocable, non-exclusive, transferable, "
+                                  "royalty-free, worldwide license to use the uploaded image, including for "
+                                  "publishing, reproducing, and model training. [transfer_of_rights]",
+                            scale=1,
+                        )
+                        share_content_is_legal = gr.Checkbox(
+                            label="By checking this box, I acknowledge that my uploaded content is legal and "
+                                  "safe, and that I am solely responsible for ensuring it complies with all "
+                                  "applicable laws and regulations. Additionally, I am aware that my Hugging Face "
+                                  "username is collected. [content_is_legal]",
+                            scale=1,
+                        )
+                        share_reason = gr.Textbox(
+                            label="(Optional) Reason for feedback",
+                            max_lines=1,
+                            interactive=True,
+                        )
+                        with gr.Row():
+                            share_login_btn.render()
+                            share_share_btn = gr.Button(
+                                "Share", variant="stop", scale=1
+                            )
+        # Function to toggle visibility and set dynamic labels
+        def toggle_sliders_and_labels(model_type):
+            if model_type == "appearance":
+                return (
+                    gr.update(visible=True, label="Predicted Albedo"),
+                    gr.update(visible=True, label="Predicted Material"),
+                    gr.update(visible=False),  # Hide third slider
+                )
+            elif model_type == "residual":
+                return (
+                    gr.update(visible=True, label="Predicted Albedo"),
+                    gr.update(visible=True, label="Predicted Shading"),
+                    gr.update(visible=True, label="Predicted Residual"),
+                )
+        # Attach the change event to update sliders
+        model_type.change(
+            fn=toggle_sliders_and_labels,
+            inputs=[model_type],
+            outputs=[image_output_slider1, image_output_slider2, image_output_slider3],
+            show_progress=False,
+        )
+        Examples(
+            fn=process_pipe_image,
+            examples=[
+                os.path.join("files", "image", name)
+                for name in [
+                    "berries.jpeg",
+                    "costumes.png",
+                    "cat.jpg",
+                    "einstein.jpg",
+                    "food.jpeg",
+                    "food_counter.png",
+                    "puzzle.jpeg",
+                    "rocket.png",
+                    "scientists.jpg",
+                    "cat2.png",
+                    "screw.png",
+                    "statues.png",
+                    "swings.jpg"
+                ]
+            ],
+            inputs=[image_input],
+            outputs= [
+                image_output_slider1,
+                image_output_slider2,
+                image_output_slider3,
+                image_output_files
+            ],
+            cache_examples=False,   # TODO: toggle later
+            directory_name="examples_image",
+        )
+        ### Image tab
+        if hf_writer is not None:
+            image_submit_btn.click(
+                fn=process_image_check,
+                inputs=image_input,
+                outputs=None,
+                preprocess=False,
+                queue=False,
+            ).success(
+                get_share_conditioned_on_login,
+                None,
+                [share_instructions, share_login_btn],
+                queue=False,
+            ).then(
+                lambda: (
+                    gr.Button(value="Share", interactive=True),
+                    gr.Accordion(visible=True),
+                    False,
+                    False,
+                    "",
+                ),
+                None,
+                [
+                    share_share_btn,
+                    share_box,
+                    share_transfer_of_rights,
+                    share_content_is_legal,
+                    share_reason,
+                ],
+                queue=False,
+            ).then(
+                fn=process_pipe_image,
+                inputs=[
+                    image_input,
+                    image_denoise_steps,
+                    image_ensemble_size,
+                    image_processing_res,
+                    model_type
+                ],
+            outputs= [
+                image_output_slider1,
+                image_output_slider2,
+                image_output_slider3,
+                image_output_files
+            ],
+                concurrency_limit=1,
+            )
+        else:
+            image_submit_btn.click(
+                fn=process_image_check,
+                inputs=image_input,
+                outputs=None,
+                preprocess=False,
+                queue=False,
+            ).success(
+                fn=process_pipe_image,
+                inputs=[
+                    image_input,
+                    image_denoise_steps,
+                    image_ensemble_size,
+                    image_processing_res,
+                    model_type
+                ],
+            outputs= [
+                image_output_slider1,
+                image_output_slider2,
+                image_output_slider3,
+                image_output_files
+            ],
+                concurrency_limit=1,
+            )
+        image_reset_btn.click(
+            fn=lambda: (
+                None,
+                None,
+                None,
+                default_image_ensemble_size,
+                default_image_denoise_steps,
+                default_image_processing_res,
+            ),
+            inputs=[],
+            outputs=[
+                image_input,
+                image_output_slider1,
+                image_output_slider2,
+                image_output_slider3,
+                image_output_files,
+                image_ensemble_size,
+                image_denoise_steps,
+                image_processing_res,
+            ],
+            queue=False,
+        )
+        if hf_writer is not None:
+            image_reset_btn.click(
+                fn=lambda: (
+                    gr.Button(value="Share", interactive=True),
+                    gr.Accordion(visible=default_share_always_show_accordion),
+                ),
+                inputs=[],
+                outputs=[
+                    share_share_btn,
+                    share_box,
+                ],
+                queue=False,
+            )
+        ### Share functionality
+        if hf_writer is not None:
+            share_components = [
+                image_input,
+                image_denoise_steps,
+                image_ensemble_size,
+                image_processing_res,
+                image_output_slider1,
+                image_output_slider2,
+                image_output_slider3,
+                share_content_is_legal,
+                share_transfer_of_rights,
+                share_reason,
+            ]
+            hf_writer.setup(share_components, "shared_data")
+            share_callback = FlagMethod(hf_writer, "Share", "", visual_feedback=True)
+            def share_precheck(
+                hf_content_is_legal,
+                image_output_slider,
+                profile: gr.OAuthProfile | None,
+            ):
+                if profile is None:
+                    raise gr.Error(
+                        "Log into the Space with your Hugging Face account first."
+                    )
+                if image_output_slider is None or image_output_slider[0] is None:
+                    raise gr.Error("No output detected; process the image first.")
+                if not hf_content_is_legal:
+                    raise gr.Error(
+                        "You must consent that the uploaded content is legal."
+                    )
+                return gr.Button(value="Sharing in progress", interactive=False)
+            share_share_btn.click(
+                share_precheck,
+                [share_content_is_legal, image_output_slider1],
+                share_share_btn,
+                preprocess=False,
+                queue=False,
+            ).success(
+                share_callback,
+                inputs=share_components,
+                outputs=share_share_btn,
+                preprocess=False,
+                queue=False,
+            )
+        demo.queue(
+            api_open=False,
+        ).launch(
+            server_name="0.0.0.0",
+            server_port=7860,
+        )
+def main():
+    CHECKPOINT = "prs-eth/marigold-iid-appearance-v1-1"
+    CROWD_DATA = "crowddata-marigold-iid-appearance-v1-1-space-v1-1"
+    os.system("pip freeze")
+    if "HF_TOKEN_LOGIN" in os.environ:
+        login(token=os.environ["HF_TOKEN_LOGIN"])
+    auth_token = os.environ.get("KEV_TOKEN")
+    pipe = MarigoldIIDAppearancePipeline.from_pretrained(CHECKPOINT,token=auth_token)
+    try:
+        import xformers
+        pipe.enable_xformers_memory_efficient_attention()
+    except:
+        pass  # run without xformers
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    pipe = pipe.to(device)
+    hf_writer = None
+    if "HF_TOKEN_LOGIN_WRITE_CROWD" in os.environ:
+        hf_writer = HuggingFaceDatasetSaver(
+            os.getenv("HF_TOKEN_LOGIN_WRITE_CROWD"),
+            CROWD_DATA,
+            private=True,
+            info_filename="dataset_info.json",
+            separate_dirs=True,
+        )
+    run_demo_server(hf_writer)
+if __name__ == "__main__":
+    main()

marigold_iid_appearance.py ADDED Viewed

	@@ -0,0 +1,544 @@

+# Copyright 2024 Anton Obukhov, Bingxin Ke, Bo Li & Kevin Qu, ETH Zurich and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# --------------------------------------------------------------------------
+# If you find this code useful, we kindly ask you to cite our paper in your work.
+# Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
+# More information about the method can be found at https://marigoldcomputervision.github.io
+# --------------------------------------------------------------------------
+import logging
+import math
+from typing import Optional, Tuple, Union
+import numpy as np
+import torch
+from diffusers import (
+    AutoencoderKL,
+    DDIMScheduler,
+    DiffusionPipeline,
+    UNet2DConditionModel,
+)
+from diffusers.utils import BaseOutput, check_min_version
+from PIL import Image
+from PIL.Image import Resampling
+from torch.utils.data import DataLoader, TensorDataset
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
+check_min_version("0.27.0.dev0")
+class MarigoldIIDAppearanceOutput(BaseOutput):
+    """
+    Output class for Marigold IID Appearance pipeline.
+    Args:
+        albedo (`np.ndarray`):
+            Predicted albedo map with the shape of [3, H, W] values in the range of [0, 1].
+        albedo_colored (`PIL.Image.Image`):
+            Colorized albedo map with the shape of [H, W, 3].
+        material (`np.ndarray`):
+            Predicted material map with the shape of [3, H, W] and values in [0, 1].
+            1st channel (Red) is roughness
+            2nd channel (Green) is metallicity
+            3rd channel (Blue) is empty (zero)
+        material_colored (`PIL.Image.Image`):
+            Colorized material map with the shape of [H, W, 3].
+            1st channel (Red) is roughness
+            2nd channel (Green) is metallicity
+            3rd channel (Blue) is empty (zero)
+    """
+    albedo: np.ndarray
+    albedo_colored: Image.Image
+    material: np.ndarray
+    material_colored: Image.Image
+class MarigoldIIDAppearancePipeline(DiffusionPipeline):
+    """
+    Pipeline for Intrinsic Image Decomposition (Albedo and Material) using Marigold: https://marigoldcomputervision.github.io.
+    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
+    library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
+    Args:
+        unet (`UNet2DConditionModel`):
+            Conditional U-Net to denoise the normals latent, conditioned on image latent.
+        vae (`AutoencoderKL`):
+            Variational Auto-Encoder (VAE) Model to encode and decode images and normals maps
+            to and from latent representations.
+        scheduler (`DDIMScheduler`):
+            A scheduler to be used in combination with `unet` to denoise the encoded image latents.
+        text_encoder (`CLIPTextModel`):
+            Text-encoder, for empty text embedding.
+        tokenizer (`CLIPTokenizer`):
+            CLIP tokenizer.
+    """
+    latent_scale_factor = 0.18215
+    def __init__(
+        self,
+        unet: UNet2DConditionModel,
+        vae: AutoencoderKL,
+        scheduler: DDIMScheduler,
+        text_encoder: CLIPTextModel,
+        tokenizer: CLIPTokenizer,
+    ):
+        super().__init__()
+        self.register_modules(
+            unet=unet,
+            vae=vae,
+            scheduler=scheduler,
+            text_encoder=text_encoder,
+            tokenizer=tokenizer,
+        )
+        self.empty_text_embed = None
+        self.n_targets = 2  # Albedo and material
+    @torch.no_grad()
+    def __call__(
+        self,
+        input_image: Image,
+        denoising_steps: int = 4,
+        ensemble_size: int = 10,
+        processing_res: int = 768,
+        match_input_res: bool = True,
+        resample_method: str = "bilinear",
+        batch_size: int = 0,
+        save_memory: bool = False,
+        seed: Union[int, None] = None,
+        color_map: str = "Spectral",  # TODO change colorization api based on modality
+        show_progress_bar: bool = True,
+        **kwargs,
+    ) -> MarigoldIIDAppearanceOutput:
+        """
+        Function invoked when calling the pipeline.
+        Args:
+            input_image (`Image`):
+                Input RGB (or gray-scale) image.
+            denoising_steps (`int`, *optional*, defaults to `10`):
+                Number of diffusion denoising steps (DDIM) during inference.
+            ensemble_size (`int`, *optional*, defaults to `10`):
+                Number of predictions to be ensembled.
+            processing_res (`int`, *optional*, defaults to `768`):
+                Maximum resolution of processing.
+                If set to 0: will not resize at all.
+            match_input_res (`bool`, *optional*, defaults to `True`):
+                Resize normals prediction to match input resolution.
+                Only valid if `limit_input_res` is not None.
+            resample_method: (`str`, *optional*, defaults to `bilinear`):
+                Resampling method used to resize images and depth predictions. This can be one of `bilinear`, `bicubic` or `nearest`, defaults to: `bilinear`.
+            batch_size (`int`, *optional*, defaults to `0`):
+                Inference batch size, no bigger than `num_ensemble`.
+                If set to 0, the script will automatically decide the proper batch size.
+            save_memory (`bool`, defaults to `False`):
+                Extra steps to save memory at the cost of perforance.
+            seed (`int`, *optional*, defaults to `None`)
+                Reproducibility seed.
+            color_map (`str`, *optional*, defaults to `"Spectral"`, pass `None` to skip colorized normals map generation):
+                Colormap used to colorize the normals map.
+            show_progress_bar (`bool`, *optional*, defaults to `True`):
+                Display a progress bar of diffusion denoising.
+        Returns:
+            `MarigoldIIDAppearanceOutput`: Output class for Marigold monocular intrinsic image decomposition (appearance) prediction pipeline, including:
+            - **albedo** (`np.ndarray`) Predicted albedo map with the shape of [3, H, W] values in the range of [0, 1]
+            - **albedo_colored** (`PIL.Image.Image`) Colorized albedo map with the shape of [3, H, W] values in the range of [0, 1]
+            - **material** (`np.ndarray`) Predicted material map with the shape of [3, H, W] and values in [0, 1]
+            - **material_colored** (`PIL.Image.Image`) Colorized material map with the shape of [3, H, W] and values in [0, 1]
+        """
+        if not match_input_res:
+            assert processing_res is not None
+        assert processing_res >= 0
+        assert denoising_steps >= 1
+        assert ensemble_size >= 1
+        # Check if denoising step is reasonable
+        self.check_inference_step(denoising_steps)
+        resample_method: Resampling = self.get_pil_resample_method(resample_method)
+        W, H = input_image.size
+        if processing_res > 0:
+            input_image = self.resize_max_res(
+                input_image, max_edge_resolution=processing_res, resample_method=resample_method,
+            )
+        input_image = input_image.convert("RGB")
+        image = np.asarray(input_image)
+        rgb = np.transpose(image, (2, 0, 1))  # [H, W, rgb] -> [rgb, H, W]
+        rgb_norm = rgb / 255.0 * 2.0 - 1.0  #  [0, 255] -> [-1, 1]
+        rgb_norm = torch.from_numpy(rgb_norm).to(self.dtype)
+        rgb_norm = rgb_norm.to(self.device)
+        assert rgb_norm.min() >= -1.0 and rgb_norm.max() <= 1.0  # TODO remove this
+        def ensemble(
+            targets: torch.Tensor, return_uncertainty: bool = False, reduction = "median",
+        ) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
+            uncertainty = None
+            if reduction == "mean":
+                prediction = torch.mean(targets, dim=0, keepdim=True)
+                if return_uncertainty:
+                    uncertainty = torch.std(targets, dim=0, keepdim=True)
+            elif reduction == "median":
+                prediction = torch.median(targets, dim=0, keepdim=True).values
+                if return_uncertainty:
+                    uncertainty = torch.median(
+                        torch.abs(targets - prediction), dim=0, keepdim=True
+                    ).values
+            else:
+                raise ValueError(f"Unrecognized reduction method: {reduction}.")
+            return prediction, uncertainty
+        duplicated_rgb = torch.stack([rgb_norm] * ensemble_size)
+        single_rgb_dataset = TensorDataset(duplicated_rgb)
+        if batch_size <= 0:
+            batch_size = self.find_batch_size(
+                ensemble_size=ensemble_size,
+                input_res=max(rgb_norm.shape[1:]),
+                dtype=self.dtype,
+            )
+        single_rgb_loader = DataLoader(
+            single_rgb_dataset, batch_size=batch_size, shuffle=False
+        )
+        target_pred_ls = []
+        iterable = single_rgb_loader
+        if show_progress_bar:
+            iterable = tqdm(
+                single_rgb_loader, desc=" " * 2 + "Inference batches", leave=False
+            )
+        for batch in iterable:
+            (batched_img,) = batch
+            target_pred = self.single_infer(
+                rgb_in=batched_img,
+                num_inference_steps=denoising_steps,
+                seed=seed,
+                show_pbar=show_progress_bar,
+            )
+            target_pred = target_pred.detach()
+            if save_memory:
+                target_pred = target_pred.cpu()
+            target_pred_ls.append(target_pred.detach())
+        target_preds = torch.concat(target_pred_ls, dim=0)
+        pred_uncert = None
+        if save_memory:
+            torch.cuda.empty_cache()
+        if ensemble_size > 1:
+            final_pred, pred_uncert = ensemble(
+                target_preds,
+                reduction = "median",
+                return_uncertainty=False
+            )
+        else:
+            final_pred = target_preds
+            pred_uncert = None
+        if match_input_res:
+            final_pred = torch.nn.functional.interpolate(
+                final_pred, (H, W), mode="bilinear"  # TODO: parameterize this method
+            )  # [1,3,H,W]
+            if pred_uncert is not None:
+                pred_uncert = torch.nn.functional.interpolate(
+                    pred_uncert.unsqueeze(1), (H, W), mode="bilinear"
+                ).squeeze(
+                    1
+                )  # [1,H,W]
+        # Convert to numpy
+        final_pred = final_pred.squeeze()
+        final_pred = final_pred.cpu().numpy()
+        albedo = final_pred[0:3, :, :]
+        material = np.stack(
+            (final_pred[3, :, :], final_pred[4, :, :], final_pred[5, :, :]), axis=0
+        )
+        albedo_colored = (albedo + 1.0) * 0.5
+        albedo_colored = (albedo_colored * 255).to(np.uint8)
+        albedo_colored = self.chw2hwc(albedo_colored)
+        albedo_colored_img = Image.fromarray(albedo_colored)
+        material_colored = (material + 1.0) * 0.5
+        material_colored = (material_colored * 255).to(np.uint8)
+        material_colored = self.chw2hwc(material_colored)
+        material_colored_img = Image.fromarray(material_colored)
+        out = MarigoldIIDAppearanceOutput(
+            albedo=albedo,
+            albedo_colored=albedo_colored_img,
+            material=material,
+            material_colored=material_colored_img
+        )
+        return out
+    def check_inference_step(self, n_step: int):
+        """
+        Check if denoising step is reasonable
+        Args:
+            n_step (`int`): denoising steps
+        """
+        assert n_step >= 1
+        if isinstance(self.scheduler, DDIMScheduler):
+            pass
+        else:
+            raise RuntimeError(f"Unsupported scheduler type: {type(self.scheduler)}")
+    def encode_empty_text(self):
+        """
+        Encode text embedding for empty prompt.
+        """
+        prompt = ""
+        text_inputs = self.tokenizer(
+            prompt,
+            padding="do_not_pad",
+            max_length=self.tokenizer.model_max_length,
+            truncation=True,
+            return_tensors="pt",
+        )
+        text_input_ids = text_inputs.input_ids.to(self.text_encoder.device)
+        self.empty_text_embed = self.text_encoder(text_input_ids)[0].to(self.dtype)
+    @torch.no_grad()
+    def single_infer(
+        self,
+            rgb_in: torch.Tensor,
+            num_inference_steps: int,
+            seed: Union[int, None],
+            show_pbar: bool,
+    ) -> torch.Tensor:
+        """
+        Perform an individual iid prediction without ensembling.
+        """
+        device = rgb_in.device
+        # Set timesteps
+        self.scheduler.set_timesteps(num_inference_steps, device=device)
+        timesteps = self.scheduler.timesteps  # [T]
+        # Encode image
+        rgb_latent = self.encode_rgb(rgb_in)
+        target_latent_shape = list(rgb_latent.shape)
+        target_latent_shape[1] *= (
+            2  # TODO: no hardcoding # self.n_targets  # (B, 4*n_targets, h, w)
+        )
+        # Initialize prediction latent with noise
+        if seed is None:
+            rand_num_generator = None
+        else:
+            rand_num_generator = torch.Generator(device=device)
+            rand_num_generator.manual_seed(seed)
+        target_latents = torch.randn(
+            target_latent_shape,
+            device=device,
+            dtype=self.dtype,
+            generator=rand_num_generator,
+        )  # [B, 4, h, w]
+        # Batched empty text embedding
+        if self.empty_text_embed is None:
+            self.encode_empty_text()
+        batch_empty_text_embed = self.empty_text_embed.repeat(
+            (rgb_latent.shape[0], 1, 1)
+        )  # [B, 2, 1024]
+        # Denoising loop
+        if show_pbar:
+            iterable = tqdm(
+                enumerate(timesteps),
+                total=len(timesteps),
+                leave=False,
+                desc=" " * 4 + "Diffusion denoising",
+            )
+        else:
+            iterable = enumerate(timesteps)
+        for i, t in iterable:
+            unet_input = torch.cat(
+                [rgb_latent, target_latents], dim=1
+            )  # this order is important
+            # predict the noise residual
+            noise_pred = self.unet(
+                unet_input, t, encoder_hidden_states=batch_empty_text_embed
+            ).sample  # [B, 4, h, w]
+            # compute the previous noisy sample x_t -> x_t-1
+            target_latents = self.scheduler.step(
+                noise_pred, t, target_latents, generator=rand_num_generator
+            ).prev_sample
+        # torch.cuda.empty_cache()  # TODO is it really needed here, even if memory saving?
+        targets = self.decode_targets(target_latents)  # [B, 3, H, W]
+        targets = torch.clip(targets, -1.0, 1.0)
+        return targets
+    def encode_rgb(self, rgb_in: torch.Tensor) -> torch.Tensor:
+        """
+        Encode RGB image into latent.
+        Args:
+            rgb_in (`torch.Tensor`):
+                Input RGB image to be encoded.
+        Returns:
+            `torch.Tensor`: Image latent.
+        """
+        # encode
+        h = self.vae.encoder(rgb_in)
+        moments = self.vae.quant_conv(h)
+        mean, logvar = torch.chunk(moments, 2, dim=1)
+        # scale latent
+        rgb_latent = mean * self.latent_scale_factor
+        return rgb_latent
+    def decode_targets(self, target_latents: torch.Tensor) -> torch.Tensor:
+        """
+        Decode target latent into target map.
+        Args:
+            target_latents (`torch.Tensor`):
+                Target latent to be decoded.
+        Returns:
+            `torch.Tensor`: Decoded target map.
+        """
+        assert target_latents.shape[1] == 8  # self.n_targets * 4
+        # scale latent
+        target_latents = target_latents / self.rgb_latent_scale_factor
+        # decode
+        targets = []
+        for i in range(self.n_targets):
+            latent = target_latents[:, i * 4 : (i + 1) * 4, :, :]
+            z = self.vae.post_quant_conv(latent)
+            stacked = self.vae.decoder(z)
+            targets.append(stacked)
+        return torch.cat(targets, dim=1)
+    @staticmethod
+    def get_pil_resample_method(method_str: str) -> Resampling:
+        resample_method_dic = {
+            "bilinear": Resampling.BILINEAR,
+            "bicubic": Resampling.BICUBIC,
+            "nearest": Resampling.NEAREST,
+        }
+        resample_method = resample_method_dic.get(method_str, None)
+        if resample_method is None:
+            raise ValueError(f"Unknown resampling method: {resample_method}")
+        else:
+            return resample_method
+    @staticmethod
+    def resize_max_res(img: Image.Image, max_edge_resolution: int, resample_method=Resampling.BILINEAR) -> Image.Image:
+        """
+        Resize image to limit maximum edge length while keeping aspect ratio.
+        """
+        original_width, original_height = img.size
+        downscale_factor = min(max_edge_resolution / original_width, max_edge_resolution / original_height)
+        new_width = int(original_width * downscale_factor)
+        new_height = int(original_height * downscale_factor)
+        resized_img = img.resize((new_width, new_height), resample=resample_method)
+        return resized_img
+    @staticmethod
+    def chw2hwc(chw):
+        assert 3 == len(chw.shape)
+        if isinstance(chw, torch.Tensor):
+            hwc = torch.permute(chw, (1, 2, 0))
+        elif isinstance(chw, np.ndarray):
+            hwc = np.moveaxis(chw, 0, -1)
+        return hwc
+    @staticmethod
+    def find_batch_size(ensemble_size: int, input_res: int, dtype: torch.dtype) -> int:
+        """
+        Automatically search for suitable operating batch size.
+        Args:
+            ensemble_size (`int`):
+                Number of predictions to be ensembled.
+            input_res (`int`):
+                Operating resolution of the input image.
+        Returns:
+            `int`: Operating batch size.
+        """
+        # Search table for suggested max. inference batch size
+        bs_search_table = [
+            # tested on A100-PCIE-80GB
+            {"res": 768, "total_vram": 79, "bs": 35, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 79, "bs": 20, "dtype": torch.float32},
+            # tested on A100-PCIE-40GB
+            {"res": 768, "total_vram": 39, "bs": 15, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 39, "bs": 8, "dtype": torch.float32},
+            {"res": 768, "total_vram": 39, "bs": 30, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 39, "bs": 15, "dtype": torch.float16},
+            # tested on RTX3090, RTX4090
+            {"res": 512, "total_vram": 23, "bs": 20, "dtype": torch.float32},
+            {"res": 768, "total_vram": 23, "bs": 7, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 23, "bs": 3, "dtype": torch.float32},
+            {"res": 512, "total_vram": 23, "bs": 40, "dtype": torch.float16},
+            {"res": 768, "total_vram": 23, "bs": 18, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 23, "bs": 10, "dtype": torch.float16},
+            # tested on GTX1080Ti
+            {"res": 512, "total_vram": 10, "bs": 5, "dtype": torch.float32},
+            {"res": 768, "total_vram": 10, "bs": 2, "dtype": torch.float32},
+            {"res": 512, "total_vram": 10, "bs": 10, "dtype": torch.float16},
+            {"res": 768, "total_vram": 10, "bs": 5, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 10, "bs": 3, "dtype": torch.float16},
+        ]
+        if not torch.cuda.is_available():
+            return 1
+        total_vram = torch.cuda.mem_get_info()[1] / 1024.0**3
+        filtered_bs_search_table = [s for s in bs_search_table if s["dtype"] == dtype]
+        for settings in sorted(
+            filtered_bs_search_table,
+            key=lambda k: (k["res"], -k["total_vram"]),
+        ):
+            if input_res <= settings["res"] and total_vram >= settings["total_vram"]:
+                bs = settings["bs"]
+                if bs > ensemble_size:
+                    bs = ensemble_size
+                elif bs > math.ceil(ensemble_size / 2) and bs < ensemble_size:
+                    bs = math.ceil(ensemble_size / 2)
+                return bs
+        return 1

marigold_iid_residual.py ADDED Viewed

	@@ -0,0 +1,552 @@

+# Copyright 2024 Anton Obukhov, Bingxin Ke & Kevin Qu, ETH Zurich and The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# --------------------------------------------------------------------------
+# If you find this code useful, we kindly ask you to cite our paper in your work.
+# Please find bibtex at: https://github.com/prs-eth/Marigold#-citation
+# More information about the method can be found at https://marigoldcomputervision.github.io
+# --------------------------------------------------------------------------
+import logging
+import math
+from typing import Optional, Tuple, Union
+import numpy as np
+import torch
+from diffusers import (
+    AutoencoderKL,
+    DDIMScheduler,
+    DiffusionPipeline,
+    UNet2DConditionModel,
+)
+from diffusers.utils import BaseOutput, check_min_version
+from PIL import Image
+from PIL.Image import Resampling
+from torch.utils.data import DataLoader, TensorDataset
+from tqdm.auto import tqdm
+from transformers import CLIPTextModel, CLIPTokenizer
+# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
+check_min_version("0.27.0.dev0")
+class MarigoldIIDResidualOutput(BaseOutput):
+    """
+    Output class for Marigold IID Residual pipeline.
+    Args:
+        albedo (`np.ndarray`):
+            Predicted albedo map with the shape of [3, H, W] values in the range of [0, 1].
+        albedo_colored (`PIL.Image.Image`):
+            Colorized albedo map with the shape of [H, W, 3].
+        shading (`np.ndarray`):
+            Predicted diffuse shading map with the shape of [3, H, W] values in the range of [0, 1].
+        shading_colored (`PIL.Image.Image`):
+            Colorized diffuse shading map with the shape of [H, W, 3].
+        residual (`np.ndarray`):
+            Predicted non-diffuse residual map with the shape of [3, H, W] values in the range of [0, 1].
+        residual_colored (`PIL.Image.Image`):
+            Colorized non-diffuse residual map with the shape of [H, W, 3].
+    """
+    albedo: np.ndarray
+    albedo_colored: Image.Image
+    shading: np.ndarray
+    shading_colored: Image.Image
+    residual: np.ndarray
+    residual_colored: Image.Image
+class MarigoldIIDResidualPipeline(DiffusionPipeline):
+    """
+    Pipeline for Intrinsic Image Decomposition (Albedo, diffuse shading and non-diffuse residual) using Marigold: https://marigoldcomputervision.github.io.
+    This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
+    library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
+    Args:
+        unet (`UNet2DConditionModel`):
+            Conditional U-Net to denoise the normals latent, conditioned on image latent.
+        vae (`AutoencoderKL`):
+            Variational Auto-Encoder (VAE) Model to encode and decode images and normals maps
+            to and from latent representations.
+        scheduler (`DDIMScheduler`):
+            A scheduler to be used in combination with `unet` to denoise the encoded image latents.
+        text_encoder (`CLIPTextModel`):
+            Text-encoder, for empty text embedding.
+        tokenizer (`CLIPTokenizer`):
+            CLIP tokenizer.
+    """
+    latent_scale_factor = 0.18215
+    def __init__(
+        self,
+        unet: UNet2DConditionModel,
+        vae: AutoencoderKL,
+        scheduler: DDIMScheduler,
+        text_encoder: CLIPTextModel,
+        tokenizer: CLIPTokenizer,
+    ):
+        super().__init__()
+        self.register_modules(
+            unet=unet,
+            vae=vae,
+            scheduler=scheduler,
+            text_encoder=text_encoder,
+            tokenizer=tokenizer,
+        )
+        self.empty_text_embed = None
+        self.n_targets = 3  # Albedo, shading, residual
+    @torch.no_grad()
+    def __call__(
+        self,
+        input_image: Image,
+        denoising_steps: int = 4,
+        ensemble_size: int = 10,
+        processing_res: int = 768,
+        match_input_res: bool = True,
+        resample_method: str = "bilinear",
+        batch_size: int = 0,
+        save_memory: bool = False,
+        seed: Union[int, None] = None,
+        color_map: str = "Spectral",  # TODO change colorization api based on modality
+        show_progress_bar: bool = True,
+        **kwargs,
+    ) -> MarigoldIIDResidualOutput:
+        """
+        Function invoked when calling the pipeline.
+        Args:
+            input_image (`Image`):
+                Input RGB (or gray-scale) image.
+            denoising_steps (`int`, *optional*, defaults to `10`):
+                Number of diffusion denoising steps (DDIM) during inference.
+            ensemble_size (`int`, *optional*, defaults to `10`):
+                Number of predictions to be ensembled.
+            processing_res (`int`, *optional*, defaults to `768`):
+                Maximum resolution of processing.
+                If set to 0: will not resize at all.
+            match_input_res (`bool`, *optional*, defaults to `True`):
+                Resize normals prediction to match input resolution.
+                Only valid if `limit_input_res` is not None.
+            resample_method: (`str`, *optional*, defaults to `bilinear`):
+                Resampling method used to resize images and depth predictions. This can be one of `bilinear`, `bicubic` or `nearest`, defaults to: `bilinear`.
+            batch_size (`int`, *optional*, defaults to `0`):
+                Inference batch size, no bigger than `num_ensemble`.
+                If set to 0, the script will automatically decide the proper batch size.
+            save_memory (`bool`, defaults to `False`):
+                Extra steps to save memory at the cost of perforance.
+            seed (`int`, *optional*, defaults to `None`)
+                Reproducibility seed.
+            color_map (`str`, *optional*, defaults to `"Spectral"`, pass `None` to skip colorized normals map generation):
+                Colormap used to colorize the normals map.
+            show_progress_bar (`bool`, *optional*, defaults to `True`):
+                Display a progress bar of diffusion denoising.
+        Returns:
+            `MarigoldIIDResidualOutput`: Output class for Marigold monocular intrinsic image decomposition (Residual) prediction pipeline, including:
+            - **albedo** (`np.ndarray`) Predicted albedo map with the shape of [3, H, W] values in the range of [0, 1]
+            - **albedo_colored** (`PIL.Image.Image`) Colorized albedo map with the shape of [3, H, W] values in the range of [0, 1]
+            - **material** (`np.ndarray`) Predicted material map with the shape of [3, H, W] and values in [0, 1]
+            - **material_colored** (`PIL.Image.Image`) Colorized material map with the shape of [3, H, W] and values in [0, 1]
+        """
+        if not match_input_res:
+            assert processing_res is not None
+        assert processing_res >= 0
+        assert denoising_steps >= 1
+        assert ensemble_size >= 1
+        # Check if denoising step is reasonable
+        self.check_inference_step(denoising_steps)
+        resample_method: Resampling = self.get_pil_resample_method(resample_method)
+        W, H = input_image.size
+        if processing_res > 0:
+            input_image = self.resize_max_res(
+                input_image, max_edge_resolution=processing_res, resample_method=resample_method,
+            )
+        input_image = input_image.convert("RGB")
+        image = np.asarray(input_image)
+        rgb = np.transpose(image, (2, 0, 1))  # [H, W, rgb] -> [rgb, H, W]
+        rgb_norm = rgb / 255.0 * 2.0 - 1.0  #  [0, 255] -> [-1, 1]
+        rgb_norm = torch.from_numpy(rgb_norm).to(self.dtype)
+        rgb_norm = rgb_norm.to(self.device)
+        assert rgb_norm.min() >= -1.0 and rgb_norm.max() <= 1.0  # TODO remove this
+        def ensemble(
+            targets: torch.Tensor, return_uncertainty: bool = False, reduction = "median",
+        ) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
+            uncertainty = None
+            if reduction == "mean":
+                prediction = torch.mean(targets, dim=0, keepdim=True)
+                if return_uncertainty:
+                    uncertainty = torch.std(targets, dim=0, keepdim=True)
+            elif reduction == "median":
+                prediction = torch.median(targets, dim=0, keepdim=True).values
+                if return_uncertainty:
+                    uncertainty = torch.median(
+                        torch.abs(targets - prediction), dim=0, keepdim=True
+                    ).values
+            else:
+                raise ValueError(f"Unrecognized reduction method: {reduction}.")
+            return prediction, uncertainty
+        duplicated_rgb = torch.stack([rgb_norm] * ensemble_size)
+        single_rgb_dataset = TensorDataset(duplicated_rgb)
+        if batch_size <= 0:
+            batch_size = self.find_batch_size(
+                ensemble_size=ensemble_size,
+                input_res=max(rgb_norm.shape[1:]),
+                dtype=self.dtype,
+            )
+        single_rgb_loader = DataLoader(
+            single_rgb_dataset, batch_size=batch_size, shuffle=False
+        )
+        target_pred_ls = []
+        iterable = single_rgb_loader
+        if show_progress_bar:
+            iterable = tqdm(
+                single_rgb_loader, desc=" " * 2 + "Inference batches", leave=False
+            )
+        for batch in iterable:
+            (batched_img,) = batch
+            target_pred = self.single_infer(
+                rgb_in=batched_img,
+                num_inference_steps=denoising_steps,
+                seed=seed,
+                show_pbar=show_progress_bar,
+            )
+            target_pred = target_pred.detach()
+            if save_memory:
+                target_pred = target_pred.cpu()
+            target_pred_ls.append(target_pred.detach())
+        target_preds = torch.concat(target_pred_ls, dim=0)
+        pred_uncert = None
+        if save_memory:
+            torch.cuda.empty_cache()
+        if ensemble_size > 1:
+            final_pred, pred_uncert = ensemble(
+                target_preds,
+                reduction = "median",
+                return_uncertainty=False
+            )
+        else:
+            final_pred = target_preds
+            pred_uncert = None
+        if match_input_res:
+            final_pred = torch.nn.functional.interpolate(
+                final_pred, (H, W), mode="bilinear"  # TODO: parameterize this method
+            )  # [1,3,H,W]
+            if pred_uncert is not None:
+                pred_uncert = torch.nn.functional.interpolate(
+                    pred_uncert.unsqueeze(1), (H, W), mode="bilinear"
+                ).squeeze(
+                    1
+                )  # [1,H,W]
+        # Convert to numpy
+        final_pred = final_pred.squeeze()
+        final_pred = final_pred.cpu().numpy()
+        albedo = final_pred[0:3, :, :]
+        shading = final_pred[3:6, :, :]
+        residual = final_pred[6:, :, :]
+        albedo_colored = (albedo + 1.0) * 0.5
+        albedo_colored = (albedo_colored * 255).to(np.uint8)
+        albedo_colored = self.chw2hwc(albedo_colored)
+        albedo_colored_img = Image.fromarray(albedo_colored)
+        shading_colored = (shading + 1.0) * 0.5
+        shading_colored = shading_colored / shading_colored.max() # rescale for better visualization
+        shading_colored = (shading_colored * 255).to(np.uint8)
+        shading_colored = self.chw2hwc(shading_colored)
+        shading_colored_img = Image.fromarray(shading_colored)
+        residual_colored = (residual + 1.0) * 0.5
+        residual_colored = residual_colored / residual_colored.max() # rescale for better visualization
+        residual_colored = (residual_colored * 255).to(np.uint8)
+        residual_colored = self.chw2hwc(residual_colored)
+        residual_colored_img = Image.fromarray(residual_colored)
+        out = MarigoldIIDResidualOutput(
+            albedo=albedo,
+            albedo_colored=albedo_colored_img,
+            shading=shading,
+            shading_colored=shading_colored_img,
+            residual=residual,
+            residual_colored=residual_colored_img
+        )
+        return out
+    def check_inference_step(self, n_step: int):
+        """
+        Check if denoising step is reasonable
+        Args:
+            n_step (`int`): denoising steps
+        """
+        assert n_step >= 1
+        if isinstance(self.scheduler, DDIMScheduler):
+            pass
+        else:
+            raise RuntimeError(f"Unsupported scheduler type: {type(self.scheduler)}")
+    def encode_empty_text(self):
+        """
+        Encode text embedding for empty prompt.
+        """
+        prompt = ""
+        text_inputs = self.tokenizer(
+            prompt,
+            padding="do_not_pad",
+            max_length=self.tokenizer.model_max_length,
+            truncation=True,
+            return_tensors="pt",
+        )
+        text_input_ids = text_inputs.input_ids.to(self.text_encoder.device)
+        self.empty_text_embed = self.text_encoder(text_input_ids)[0].to(self.dtype)
+    @torch.no_grad()
+    def single_infer(
+        self,
+            rgb_in: torch.Tensor,
+            num_inference_steps: int,
+            seed: Union[int, None],
+            show_pbar: bool,
+    ) -> torch.Tensor:
+        """
+        Perform an individual iid prediction without ensembling.
+        """
+        device = rgb_in.device
+        # Set timesteps
+        self.scheduler.set_timesteps(num_inference_steps, device=device)
+        timesteps = self.scheduler.timesteps  # [T]
+        # Encode image
+        rgb_latent = self.encode_rgb(rgb_in)
+        target_latent_shape = list(rgb_latent.shape)
+        target_latent_shape[1] *= (
+            3  # TODO: no hardcoding # self.n_targets  # (B, 4*n_targets, h, w)
+        )
+        # Initialize prediction latent with noise
+        if seed is None:
+            rand_num_generator = None
+        else:
+            rand_num_generator = torch.Generator(device=device)
+            rand_num_generator.manual_seed(seed)
+        target_latents = torch.randn(
+            target_latent_shape,
+            device=device,
+            dtype=self.dtype,
+            generator=rand_num_generator,
+        )  # [B, 4, h, w]
+        # Batched empty text embedding
+        if self.empty_text_embed is None:
+            self.encode_empty_text()
+        batch_empty_text_embed = self.empty_text_embed.repeat(
+            (rgb_latent.shape[0], 1, 1)
+        )  # [B, 2, 1024]
+        # Denoising loop
+        if show_pbar:
+            iterable = tqdm(
+                enumerate(timesteps),
+                total=len(timesteps),
+                leave=False,
+                desc=" " * 4 + "Diffusion denoising",
+            )
+        else:
+            iterable = enumerate(timesteps)
+        for i, t in iterable:
+            unet_input = torch.cat(
+                [rgb_latent, target_latents], dim=1
+            )  # this order is important
+            # predict the noise residual
+            noise_pred = self.unet(
+                unet_input, t, encoder_hidden_states=batch_empty_text_embed
+            ).sample  # [B, 4, h, w]
+            # compute the previous noisy sample x_t -> x_t-1
+            target_latents = self.scheduler.step(
+                noise_pred, t, target_latents, generator=rand_num_generator
+            ).prev_sample
+        # torch.cuda.empty_cache()  # TODO is it really needed here, even if memory saving?
+        targets = self.decode_targets(target_latents)  # [B, 3, H, W]
+        targets = torch.clip(targets, -1.0, 1.0)
+        return targets
+    def encode_rgb(self, rgb_in: torch.Tensor) -> torch.Tensor:
+        """
+        Encode RGB image into latent.
+        Args:
+            rgb_in (`torch.Tensor`):
+                Input RGB image to be encoded.
+        Returns:
+            `torch.Tensor`: Image latent.
+        """
+        # encode
+        h = self.vae.encoder(rgb_in)
+        moments = self.vae.quant_conv(h)
+        mean, logvar = torch.chunk(moments, 2, dim=1)
+        # scale latent
+        rgb_latent = mean * self.latent_scale_factor
+        return rgb_latent
+    def decode_targets(self, target_latents: torch.Tensor) -> torch.Tensor:
+        """
+        Decode target latent into target map.
+        Args:
+            target_latents (`torch.Tensor`):
+                Target latent to be decoded.
+        Returns:
+            `torch.Tensor`: Decoded target map.
+        """
+        assert target_latents.shape[1] == 12  # self.n_targets * 4
+        # scale latent
+        target_latents = target_latents / self.rgb_latent_scale_factor
+        # decode
+        targets = []
+        for i in range(self.n_targets):
+            latent = target_latents[:, i * 4 : (i + 1) * 4, :, :]
+            z = self.vae.post_quant_conv(latent)
+            stacked = self.vae.decoder(z)
+            targets.append(stacked)
+        return torch.cat(targets, dim=1)
+    @staticmethod
+    def get_pil_resample_method(method_str: str) -> Resampling:
+        resample_method_dic = {
+            "bilinear": Resampling.BILINEAR,
+            "bicubic": Resampling.BICUBIC,
+            "nearest": Resampling.NEAREST,
+        }
+        resample_method = resample_method_dic.get(method_str, None)
+        if resample_method is None:
+            raise ValueError(f"Unknown resampling method: {resample_method}")
+        else:
+            return resample_method
+    @staticmethod
+    def resize_max_res(img: Image.Image, max_edge_resolution: int, resample_method=Resampling.BILINEAR) -> Image.Image:
+        """
+        Resize image to limit maximum edge length while keeping aspect ratio.
+        """
+        original_width, original_height = img.size
+        downscale_factor = min(max_edge_resolution / original_width, max_edge_resolution / original_height)
+        new_width = int(original_width * downscale_factor)
+        new_height = int(original_height * downscale_factor)
+        resized_img = img.resize((new_width, new_height), resample=resample_method)
+        return resized_img
+    @staticmethod
+    def chw2hwc(chw):
+        assert 3 == len(chw.shape)
+        if isinstance(chw, torch.Tensor):
+            hwc = torch.permute(chw, (1, 2, 0))
+        elif isinstance(chw, np.ndarray):
+            hwc = np.moveaxis(chw, 0, -1)
+        return hwc
+    @staticmethod
+    def find_batch_size(ensemble_size: int, input_res: int, dtype: torch.dtype) -> int:
+        """
+        Automatically search for suitable operating batch size.
+        Args:
+            ensemble_size (`int`):
+                Number of predictions to be ensembled.
+            input_res (`int`):
+                Operating resolution of the input image.
+        Returns:
+            `int`: Operating batch size.
+        """
+        # Search table for suggested max. inference batch size
+        bs_search_table = [
+            # tested on A100-PCIE-80GB
+            {"res": 768, "total_vram": 79, "bs": 35, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 79, "bs": 20, "dtype": torch.float32},
+            # tested on A100-PCIE-40GB
+            {"res": 768, "total_vram": 39, "bs": 15, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 39, "bs": 8, "dtype": torch.float32},
+            {"res": 768, "total_vram": 39, "bs": 30, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 39, "bs": 15, "dtype": torch.float16},
+            # tested on RTX3090, RTX4090
+            {"res": 512, "total_vram": 23, "bs": 20, "dtype": torch.float32},
+            {"res": 768, "total_vram": 23, "bs": 7, "dtype": torch.float32},
+            {"res": 1024, "total_vram": 23, "bs": 3, "dtype": torch.float32},
+            {"res": 512, "total_vram": 23, "bs": 40, "dtype": torch.float16},
+            {"res": 768, "total_vram": 23, "bs": 18, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 23, "bs": 10, "dtype": torch.float16},
+            # tested on GTX1080Ti
+            {"res": 512, "total_vram": 10, "bs": 5, "dtype": torch.float32},
+            {"res": 768, "total_vram": 10, "bs": 2, "dtype": torch.float32},
+            {"res": 512, "total_vram": 10, "bs": 10, "dtype": torch.float16},
+            {"res": 768, "total_vram": 10, "bs": 5, "dtype": torch.float16},
+            {"res": 1024, "total_vram": 10, "bs": 3, "dtype": torch.float16},
+        ]
+        if not torch.cuda.is_available():
+            return 1
+        total_vram = torch.cuda.mem_get_info()[1] / 1024.0**3
+        filtered_bs_search_table = [s for s in bs_search_table if s["dtype"] == dtype]
+        for settings in sorted(
+            filtered_bs_search_table,
+            key=lambda k: (k["res"], -k["total_vram"]),
+        ):
+            if input_res <= settings["res"] and total_vram >= settings["total_vram"]:
+                bs = settings["bs"]
+                if bs > ensemble_size:
+                    bs = ensemble_size
+                elif bs > math.ceil(ensemble_size / 2) and bs < ensemble_size:
+                    bs = math.ceil(ensemble_size / 2)
+                return bs
+        return 1

requirements.txt ADDED Viewed

	@@ -0,0 +1,126 @@

+accelerate==0.25.0
+aiofiles==23.2.1
+aiohttp==3.9.3
+aiosignal==1.3.1
+altair==5.3.0
+annotated-types==0.6.0
+anyio==4.3.0
+async-timeout==4.0.3
+attrs==23.2.0
+Authlib==1.3.0
+certifi==2024.2.2
+cffi==1.16.0
+charset-normalizer==3.3.2
+click==8.0.4
+cmake==3.29.0.1
+contourpy==1.2.0
+cryptography==42.0.5
+cycler==0.12.1
+dataclasses-json==0.6.4
+datasets==2.18.0
+Deprecated==1.2.14
+diffusers==0.27.2
+dill==0.3.8
+exceptiongroup==1.2.0
+fastapi==0.110.0
+ffmpy==0.3.2
+filelock==3.13.3
+fonttools==4.50.0
+frozenlist==1.4.1
+fsspec==2024.2.0
+gradio==4.21.0
+gradio_client==0.12.0
+gradio_imageslider==0.0.18
+h11==0.14.0
+httpcore==1.0.5
+httpx==0.27.0
+huggingface-hub==0.22.1
+idna==3.6
+imageio==2.34.0
+imageio-ffmpeg==0.4.9
+importlib_metadata==7.1.0
+importlib_resources==6.4.0
+itsdangerous==2.1.2
+Jinja2==3.1.3
+jsonschema==4.21.1
+jsonschema-specifications==2023.12.1
+kiwisolver==1.4.5
+lit==18.1.2
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+marshmallow==3.21.1
+matplotlib==3.8.2
+mdurl==0.1.2
+mpmath==1.3.0
+multidict==6.0.5
+multiprocess==0.70.16
+mypy-extensions==1.0.0
+networkx==3.2.1
+numpy==1.26.4
+nvidia-cublas-cu11==11.10.3.66
+nvidia-cuda-cupti-cu11==11.7.101
+nvidia-cuda-nvrtc-cu11==11.7.99
+nvidia-cuda-runtime-cu11==11.7.99
+nvidia-cudnn-cu11==8.5.0.96
+nvidia-cufft-cu11==10.9.0.58
+nvidia-curand-cu11==10.2.10.91
+nvidia-cusolver-cu11==11.4.0.1
+nvidia-cusparse-cu11==11.7.4.91
+nvidia-nccl-cu11==2.14.3
+nvidia-nvtx-cu11==11.7.91
+orjson==3.10.0
+packaging==24.0
+pandas==2.2.1
+pillow==10.2.0
+protobuf==3.20.3
+psutil==5.9.8
+pyarrow==15.0.2
+pyarrow-hotfix==0.6
+pycparser==2.22
+pydantic==2.6.4
+pydantic_core==2.16.3
+pydub==0.25.1
+pygltflib==1.16.1
+Pygments==2.17.2
+pyparsing==3.1.2
+python-dateutil==2.9.0.post0
+python-multipart==0.0.9
+pytz==2024.1
+PyYAML==6.0.1
+referencing==0.34.0
+regex==2023.12.25
+requests==2.31.0
+rich==13.7.1
+rpds-py==0.18.0
+ruff==0.3.4
+safetensors==0.4.2
+scipy==1.11.4
+semantic-version==2.10.0
+shellingham==1.5.4
+six==1.16.0
+sniffio==1.3.1
+spaces==0.25.0
+starlette==0.36.3
+sympy==1.12
+tokenizers==0.15.2
+tomlkit==0.12.0
+toolz==0.12.1
+torch==2.0.1
+tqdm==4.66.2
+transformers==4.36.1
+trimesh==4.0.5
+triton==2.0.0
+typer==0.12.0
+typer-cli==0.12.0
+typer-slim==0.12.0
+typing-inspect==0.9.0
+typing_extensions==4.10.0
+tzdata==2024.1
+urllib3==2.2.1
+uvicorn==0.29.0
+websockets==11.0.3
+wrapt==1.16.0
+xformers==0.0.21
+xxhash==3.4.1
+yarl==1.9.4
+zipp==3.18.1

requirements_min.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+gradio==4.21.0
+gradio-imageslider==0.0.18
+pygltflib==1.16.1
+trimesh==4.0.5
+imageio
+imageio-ffmpeg
+Pillow
+spaces==0.25.0
+accelerate==0.25.0
+diffusers==0.27.2
+matplotlib==3.8.2
+scipy==1.11.4
+torch==2.0.1
+transformers==4.36.1
+xformers==0.0.21