Spaces:

Mithun12345
/

3D_Model_Demo

Configuration error

App Files Files Community

Mithun12345 commited on Oct 9, 2024

Commit

5a33903

verified ·

1 Parent(s): ae3c97e

Upload 7 files

Browse files

Files changed (7) hide show

.gitignore +43 -0
LICENSE +201 -0
README.md +146 -12
app.py +348 -0
requirements.txt +19 -0
run.py +262 -0
train.py +286 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,43 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+eggs/
+.eggs/
+.vscode/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+.DS_Store
+tools/objaverse_rendering/blender-3.2.2-linux-x64/
+tools/objaverse_rendering/output/
+ckpts/
+lightning_logs/
+logs/
+.trash/
+.env/
+outputs/
+figures*/
+# Useless Files
+*.sh
+blender/
+.restore/

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md CHANGED Viewed

@@ -1,12 +1,146 @@
----
-title: 3D Model Demo
-emoji: 📈
-colorFrom: green
-colorTo: yellow
-sdk: gradio
-sdk_version: 4.44.1
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align="center">
+# InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
+<a href="https://arxiv.org/abs/2404.07191"><img src="https://img.shields.io/badge/ArXiv-2404.07191-brightgreen"></a>
+<a href="https://huggingface.co/TencentARC/InstantMesh"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model_Card-Huggingface-orange"></a>
+<a href="https://huggingface.co/spaces/TencentARC/InstantMesh"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Gradio%20Demo-Huggingface-orange"></a> <br>
+<a href="https://replicate.com/camenduru/instantmesh"><img src="https://img.shields.io/badge/Demo-Replicate-blue"></a>
+<a href="https://colab.research.google.com/github/camenduru/InstantMesh-jupyter/blob/main/InstantMesh_jupyter.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg"></a>
+<a href="https://github.com/jtydhr88/ComfyUI-InstantMesh"><img src="https://img.shields.io/badge/Demo-ComfyUI-8A2BE2"></a>
+</div>
+---
+This repo is the official implementation of InstantMesh, a feed-forward framework for efficient 3D mesh generation from a single image based on the LRM/Instant3D architecture.
+https://github.com/TencentARC/InstantMesh/assets/20635237/dab3511e-e7c6-4c0b-bab7-15772045c47d
+# 🚩 Features and Todo List
+- [x] 🔥🔥 Release Zero123++ fine-tuning code.
+- [x] 🔥🔥 Support for running gradio demo on two GPUs to save memory.
+- [x] 🔥🔥 Support for running demo with docker. Please refer to the [docker](docker/) directory.
+- [x] Release inference and training code.
+- [x] Release model weights.
+- [x] Release huggingface gradio demo. Please try it at [demo](https://huggingface.co/spaces/TencentARC/InstantMesh) link.
+- [ ] Add support for more multi-view diffusion models.
+# ⚙️ Dependencies and Installation
+We recommend using `Python>=3.10`, `PyTorch>=2.1.0`, and `CUDA>=12.1`.
+```bash
+conda create --name instantmesh python=3.10
+conda activate instantmesh
+pip install -U pip
+# Ensure Ninja is installed
+conda install Ninja
+# Install the correct version of CUDA
+conda install cuda -c nvidia/label/cuda-12.1.0
+# Install PyTorch and xformers
+# You may need to install another xformers version if you use a different PyTorch version
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
+pip install xformers==0.0.22.post7
+# For Linux users: Install Triton
+pip install triton
+# For Windows users: Use the prebuilt version of Triton provided here:
+pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl
+# Install other requirements
+pip install -r requirements.txt
+```
+# 💫 How to Use
+## Download the models
+We provide 4 sparse-view reconstruction model variants and a customized Zero123++ UNet for white-background image generation in the [model card](https://huggingface.co/TencentARC/InstantMesh).
+Our inference script will download the models automatically. Alternatively, you can manually download the models and put them under the `ckpts/` directory.
+By default, we use the `instant-mesh-large` reconstruction model variant.
+## Start a local gradio demo
+To start a gradio demo in your local machine, simply run:
+```bash
+python app.py
+```
+If you have multiple GPUs in your machine, the demo app will run on two GPUs automatically to save memory. You can also force it to run on a single GPU:
+```bash
+CUDA_VISIBLE_DEVICES=0 python app.py
+```
+Alternatively, you can run the demo with docker. Please follow the instructions in the [docker](docker/) directory.
+## Running with command line
+To generate 3D meshes from images via command line, simply run:
+```bash
+python run.py configs/instant-mesh-large.yaml examples/hatsune_miku.png --save_video
+```
+We use [rembg](https://github.com/danielgatis/rembg) to segment the foreground object. If the input image already has an alpha mask, please specify the `no_rembg` flag:
+```bash
+python run.py configs/instant-mesh-large.yaml examples/hatsune_miku.png --save_video --no_rembg
+```
+By default, our script exports a `.obj` mesh with vertex colors, please specify the `--export_texmap` flag if you hope to export a mesh with a texture map instead (this will cost longer time):
+```bash
+python run.py configs/instant-mesh-large.yaml examples/hatsune_miku.png --save_video --export_texmap
+```
+Please use a different `.yaml` config file in the [configs](./configs) directory if you hope to use other reconstruction model variants. For example, using the `instant-nerf-large` model for generation:
+```bash
+python run.py configs/instant-nerf-large.yaml examples/hatsune_miku.png --save_video
+```
+**Note:** When using the `NeRF` model variants for image-to-3D generation, exporting a mesh with texture map by specifying `--export_texmap` may cost long time in the UV unwarping step since the default iso-surface extraction resolution is `256`. You can set a lower iso-surface extraction resolution in the config file.
+# 💻 Training
+We provide our training code to facilitate future research. But we cannot provide the training dataset due to its size. Please refer to our [dataloader](src/data/objaverse.py) for more details.
+To train the sparse-view reconstruction models, please run:
+```bash
+# Training on NeRF representation
+python train.py --base configs/instant-nerf-large-train.yaml --gpus 0,1,2,3,4,5,6,7 --num_nodes 1
+# Training on Mesh representation
+python train.py --base configs/instant-mesh-large-train.yaml --gpus 0,1,2,3,4,5,6,7 --num_nodes 1
+```
+We also provide our Zero123++ fine-tuning code since it is frequently requested. The running command is:
+```bash
+python train.py --base configs/zero123plus-finetune.yaml --gpus 0,1,2,3,4,5,6,7 --num_nodes 1
+```
+# :books: Citation
+If you find our work useful for your research or applications, please cite using this BibTeX:
+```BibTeX
+@article{xu2024instantmesh,
+  title={InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models},
+  author={Xu, Jiale and Cheng, Weihao and Gao, Yiming and Wang, Xintao and Gao, Shenghua and Shan, Ying},
+  journal={arXiv preprint arXiv:2404.07191},
+  year={2024}
+}
+```
+# 🤗 Acknowledgements
+We thank the authors of the following projects for their excellent contributions to 3D generative AI!
+- [Zero123++](https://github.com/SUDO-AI-3D/zero123plus)
+- [OpenLRM](https://github.com/3DTopia/OpenLRM)
+- [FlexiCubes](https://github.com/nv-tlabs/FlexiCubes)
+- [Instant3D](https://instant-3d.github.io/)
+Thank [@camenduru](https://github.com/camenduru) for implementing [Replicate Demo](https://replicate.com/camenduru/instantmesh) and [Colab Demo](https://colab.research.google.com/github/camenduru/InstantMesh-jupyter/blob/main/InstantMesh_jupyter.ipynb)!
+Thank [@jtydhr88](https://github.com/jtydhr88) for implementing [ComfyUI support](https://github.com/jtydhr88/ComfyUI-InstantMesh)!

app.py ADDED Viewed

	@@ -0,0 +1,348 @@

+import os
+import imageio
+import numpy as np
+import torch
+import rembg
+from PIL import Image
+from torchvision.transforms import v2
+from pytorch_lightning import seed_everything
+from omegaconf import OmegaConf
+from einops import rearrange, repeat
+from tqdm import tqdm
+from diffusers import DiffusionPipeline, EulerAncestralDiscreteScheduler
+from src.utils.train_util import instantiate_from_config
+from src.utils.camera_util import (
+    FOV_to_intrinsics,
+    get_zero123plus_input_cameras,
+    get_circular_camera_poses,
+)
+from src.utils.mesh_util import save_obj, save_glb
+from src.utils.infer_util import remove_background, resize_foreground, images_to_video
+import tempfile
+from huggingface_hub import hf_hub_download
+if torch.cuda.is_available() and torch.cuda.device_count() >= 2:
+    device0 = torch.device('cuda:0')
+    device1 = torch.device('cuda:1')
+else:
+    device0 = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    device1 = device0
+# Define the cache directory for model files
+model_cache_dir = './ckpts/'
+os.makedirs(model_cache_dir, exist_ok=True)
+def get_render_cameras(batch_size=1, M=120, radius=2.5, elevation=10.0, is_flexicubes=False):
+    """
+    Get the rendering camera parameters.
+    """
+    c2ws = get_circular_camera_poses(M=M, radius=radius, elevation=elevation)
+    if is_flexicubes:
+        cameras = torch.linalg.inv(c2ws)
+        cameras = cameras.unsqueeze(0).repeat(batch_size, 1, 1, 1)
+    else:
+        extrinsics = c2ws.flatten(-2)
+        intrinsics = FOV_to_intrinsics(30.0).unsqueeze(0).repeat(M, 1, 1).float().flatten(-2)
+        cameras = torch.cat([extrinsics, intrinsics], dim=-1)
+        cameras = cameras.unsqueeze(0).repeat(batch_size, 1, 1)
+    return cameras
+def images_to_video(images, output_path, fps=30):
+    # images: (N, C, H, W)
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    frames = []
+    for i in range(images.shape[0]):
+        frame = (images[i].permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8).clip(0, 255)
+        assert frame.shape[0] == images.shape[2] and frame.shape[1] == images.shape[3], \
+            f"Frame shape mismatch: {frame.shape} vs {images.shape}"
+        assert frame.min() >= 0 and frame.max() <= 255, \
+            f"Frame value out of range: {frame.min()} ~ {frame.max()}"
+        frames.append(frame)
+    imageio.mimwrite(output_path, np.stack(frames), fps=fps, codec='h264')
+###############################################################################
+# Configuration.
+###############################################################################
+seed_everything(0)
+config_path = 'configs/instant-mesh-large.yaml'
+config = OmegaConf.load(config_path)
+config_name = os.path.basename(config_path).replace('.yaml', '')
+model_config = config.model_config
+infer_config = config.infer_config
+IS_FLEXICUBES = True if config_name.startswith('instant-mesh') else False
+device = torch.device('cuda')
+# load diffusion model
+print('Loading diffusion model ...')
+pipeline = DiffusionPipeline.from_pretrained(
+    "sudo-ai/zero123plus-v1.2",
+    custom_pipeline="zero123plus",
+    torch_dtype=torch.float16,
+    cache_dir=model_cache_dir
+)
+pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(
+    pipeline.scheduler.config, timestep_spacing='trailing'
+)
+# load custom white-background UNet
+unet_ckpt_path = hf_hub_download(repo_id="TencentARC/InstantMesh", filename="diffusion_pytorch_model.bin", repo_type="model", cache_dir=model_cache_dir)
+state_dict = torch.load(unet_ckpt_path, map_location='cpu')
+pipeline.unet.load_state_dict(state_dict, strict=True)
+pipeline = pipeline.to(device0)
+# load reconstruction model
+print('Loading reconstruction model ...')
+model_ckpt_path = hf_hub_download(repo_id="TencentARC/InstantMesh", filename="instant_mesh_large.ckpt", repo_type="model", cache_dir=model_cache_dir)
+model = instantiate_from_config(model_config)
+state_dict = torch.load(model_ckpt_path, map_location='cpu')['state_dict']
+state_dict = {k[14:]: v for k, v in state_dict.items() if k.startswith('lrm_generator.') and 'source_camera' not in k}
+model.load_state_dict(state_dict, strict=True)
+model = model.to(device1)
+if IS_FLEXICUBES:
+    model.init_flexicubes_geometry(device1, fovy=30.0)
+model = model.eval()
+print('Loading Finished!')
+def check_input_image(input_image):
+    if input_image is None:
+        raise gr.Error("No image uploaded!")
+def preprocess(input_image, do_remove_background):
+    rembg_session = rembg.new_session() if do_remove_background else None
+    if do_remove_background:
+        input_image = remove_background(input_image, rembg_session)
+        input_image = resize_foreground(input_image, 0.85)
+    return input_image
+def generate_mvs(input_image, sample_steps, sample_seed):
+    seed_everything(sample_seed)
+    # sampling
+    generator = torch.Generator(device=device0)
+    z123_image = pipeline(
+        input_image,
+        num_inference_steps=sample_steps,
+        generator=generator,
+    ).images[0]
+    show_image = np.asarray(z123_image, dtype=np.uint8)
+    show_image = torch.from_numpy(show_image)     # (960, 640, 3)
+    show_image = rearrange(show_image, '(n h) (m w) c -> (n m) h w c', n=3, m=2)
+    show_image = rearrange(show_image, '(n m) h w c -> (n h) (m w) c', n=2, m=3)
+    show_image = Image.fromarray(show_image.numpy())
+    return z123_image, show_image
+def make_mesh(mesh_fpath, planes):
+    mesh_basename = os.path.basename(mesh_fpath).split('.')[0]
+    mesh_dirname = os.path.dirname(mesh_fpath)
+    mesh_glb_fpath = os.path.join(mesh_dirname, f"{mesh_basename}.glb")
+    with torch.no_grad():
+        # get mesh
+        mesh_out = model.extract_mesh(
+            planes,
+            use_texture_map=False,
+            **infer_config,
+        )
+        vertices, faces, vertex_colors = mesh_out
+        vertices = vertices[:, [1, 2, 0]]
+        save_glb(vertices, faces, vertex_colors, mesh_glb_fpath)
+        save_obj(vertices, faces, vertex_colors, mesh_fpath)
+        print(f"Mesh saved to {mesh_fpath}")
+    return mesh_fpath, mesh_glb_fpath
+def make3d(images):
+    images = np.asarray(images, dtype=np.float32) / 255.0
+    images = torch.from_numpy(images).permute(2, 0, 1).contiguous().float()     # (3, 960, 640)
+    images = rearrange(images, 'c (n h) (m w) -> (n m) c h w', n=3, m=2)        # (6, 3, 320, 320)
+    input_cameras = get_zero123plus_input_cameras(batch_size=1, radius=4.0).to(device1)
+    render_cameras = get_render_cameras(
+        batch_size=1, radius=4.5, elevation=20.0, is_flexicubes=IS_FLEXICUBES).to(device1)
+    images = images.unsqueeze(0).to(device1)
+    images = v2.functional.resize(images, (320, 320), interpolation=3, antialias=True).clamp(0, 1)
+    mesh_fpath = tempfile.NamedTemporaryFile(suffix=f".obj", delete=False).name
+    print(mesh_fpath)
+    mesh_basename = os.path.basename(mesh_fpath).split('.')[0]
+    mesh_dirname = os.path.dirname(mesh_fpath)
+    video_fpath = os.path.join(mesh_dirname, f"{mesh_basename}.mp4")
+    with torch.no_grad():
+        # get triplane
+        planes = model.forward_planes(images, input_cameras)
+        # get video
+        chunk_size = 20 if IS_FLEXICUBES else 1
+        render_size = 384
+        frames = []
+        for i in tqdm(range(0, render_cameras.shape[1], chunk_size)):
+            if IS_FLEXICUBES:
+                frame = model.forward_geometry(
+                    planes,
+                    render_cameras[:, i:i+chunk_size],
+                    render_size=render_size,
+                )['img']
+            else:
+                frame = model.synthesizer(
+                    planes,
+                    cameras=render_cameras[:, i:i+chunk_size],
+                    render_size=render_size,
+                )['images_rgb']
+            frames.append(frame)
+        frames = torch.cat(frames, dim=1)
+        images_to_video(
+            frames[0],
+            video_fpath,
+            fps=30,
+        )
+        print(f"Video saved to {video_fpath}")
+    mesh_fpath, mesh_glb_fpath = make_mesh(mesh_fpath, planes)
+    return video_fpath, mesh_fpath, mesh_glb_fpath
+import gradio as gr
+with gr.Blocks() as demo:
+    gr.Markdown(_HEADER_)
+    with gr.Row(variant="panel"):
+        with gr.Column():
+            with gr.Row():
+                input_image = gr.Image(
+                    label="Input Image",
+                    image_mode="RGBA",
+                    sources="upload",
+                    width=256,
+                    height=256,
+                    type="pil",
+                    elem_id="content_image",
+                )
+                processed_image = gr.Image(
+                    label="Processed Image",
+                    image_mode="RGBA",
+                    width=256,
+                    height=256,
+                    type="pil",
+                    interactive=False
+                )
+            with gr.Row():
+                with gr.Group():
+                    do_remove_background = gr.Checkbox(
+                        label="Remove Background", value=True
+                    )
+                    sample_seed = gr.Number(value=42, label="Seed Value", precision=0)
+                    sample_steps = gr.Slider(
+                        label="Sample Steps",
+                        minimum=30,
+                        maximum=75,
+                        value=75,
+                        step=5
+                    )
+            with gr.Row():
+                submit = gr.Button("Generate", elem_id="generate", variant="primary")
+            with gr.Row(variant="panel"):
+                gr.Examples(
+                    examples=[
+                        os.path.join("examples", img_name) for img_name in sorted(os.listdir("examples"))
+                    ],
+                    inputs=[input_image],
+                    label="Examples",
+                    examples_per_page=20
+                )
+        with gr.Column():
+            with gr.Row():
+                with gr.Column():
+                    mv_show_images = gr.Image(
+                        label="Generated Multi-views",
+                        type="pil",
+                        width=379,
+                        interactive=False
+                    )
+                with gr.Column():
+                    output_video = gr.Video(
+                        label="video", format="mp4",
+                        width=379,
+                        autoplay=True,
+                        interactive=False
+                    )
+            with gr.Row():
+                with gr.Tab("OBJ"):
+                    output_model_obj = gr.Model3D(
+                        label="Output Model (OBJ Format)",
+                        #width=768,
+                        interactive=False,
+                    )
+                    gr.Markdown("Note: Downloaded .obj model will be flipped. Export .glb instead or manually flip it before usage.")
+                with gr.Tab("GLB"):
+                    output_model_glb = gr.Model3D(
+                        label="Output Model (GLB Format)",
+                        #width=768,
+                        interactive=False,
+                    )
+                    gr.Markdown("Note: The model shown here has a darker appearance. Download to get correct results.")
+            with gr.Row():
+                gr.Markdown('''Try a different <b>seed value</b> if the result is unsatisfying (Default: 42).''')
+    gr.Markdown(_CITE_)
+    mv_images = gr.State()
+    submit.click(fn=check_input_image, inputs=[input_image]).success(
+        fn=preprocess,
+        inputs=[input_image, do_remove_background],
+        outputs=[processed_image],
+    ).success(
+        fn=generate_mvs,
+        inputs=[processed_image, sample_steps, sample_seed],
+        outputs=[mv_images, mv_show_images],
+    ).success(
+        fn=make3d,
+        inputs=[mv_images],
+        outputs=[output_video, output_model_obj, output_model_glb]
+    )
+demo.queue(max_size=10)
+demo.launch(server_name="0.0.0.0", server_port=43839)

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+pytorch-lightning==2.1.2
+gradio==3.41.2
+huggingface-hub
+einops
+omegaconf
+torchmetrics
+webdataset
+accelerate
+tensorboard
+PyMCubes
+trimesh
+rembg
+transformers==4.34.1
+diffusers==0.20.2
+bitsandbytes
+imageio[ffmpeg]
+xatlas
+plyfile
+git+https://github.com/NVlabs/nvdiffrast/

run.py ADDED Viewed

	@@ -0,0 +1,262 @@

+import os
+import argparse
+import numpy as np
+import torch
+import rembg
+from PIL import Image
+from torchvision.transforms import v2
+from pytorch_lightning import seed_everything
+from omegaconf import OmegaConf
+from einops import rearrange, repeat
+from tqdm import tqdm
+from huggingface_hub import hf_hub_download
+from diffusers import DiffusionPipeline, EulerAncestralDiscreteScheduler
+from src.utils.train_util import instantiate_from_config
+from src.utils.camera_util import (
+    FOV_to_intrinsics,
+    get_zero123plus_input_cameras,
+    get_circular_camera_poses,
+)
+from src.utils.mesh_util import save_obj, save_obj_with_mtl
+from src.utils.infer_util import remove_background, resize_foreground, save_video
+def get_render_cameras(batch_size=1, M=120, radius=4.0, elevation=20.0, is_flexicubes=False):
+    """
+    Get the rendering camera parameters.
+    """
+    c2ws = get_circular_camera_poses(M=M, radius=radius, elevation=elevation)
+    if is_flexicubes:
+        cameras = torch.linalg.inv(c2ws)
+        cameras = cameras.unsqueeze(0).repeat(batch_size, 1, 1, 1)
+    else:
+        extrinsics = c2ws.flatten(-2)
+        intrinsics = FOV_to_intrinsics(30.0).unsqueeze(0).repeat(M, 1, 1).float().flatten(-2)
+        cameras = torch.cat([extrinsics, intrinsics], dim=-1)
+        cameras = cameras.unsqueeze(0).repeat(batch_size, 1, 1)
+    return cameras
+def render_frames(model, planes, render_cameras, render_size=512, chunk_size=1, is_flexicubes=False):
+    """
+    Render frames from triplanes.
+    """
+    frames = []
+    for i in tqdm(range(0, render_cameras.shape[1], chunk_size)):
+        if is_flexicubes:
+            frame = model.forward_geometry(
+                planes,
+                render_cameras[:, i:i+chunk_size],
+                render_size=render_size,
+            )['img']
+        else:
+            frame = model.forward_synthesizer(
+                planes,
+                render_cameras[:, i:i+chunk_size],
+                render_size=render_size,
+            )['images_rgb']
+        frames.append(frame)
+    frames = torch.cat(frames, dim=1)[0]    # we suppose batch size is always 1
+    return frames
+###############################################################################
+# Arguments.
+###############################################################################
+parser = argparse.ArgumentParser()
+parser.add_argument('config', type=str, help='Path to config file.')
+parser.add_argument('input_path', type=str, help='Path to input image or directory.')
+parser.add_argument('--output_path', type=str, default='outputs/', help='Output directory.')
+parser.add_argument('--diffusion_steps', type=int, default=75, help='Denoising Sampling steps.')
+parser.add_argument('--seed', type=int, default=42, help='Random seed for sampling.')
+parser.add_argument('--scale', type=float, default=1.0, help='Scale of generated object.')
+parser.add_argument('--distance', type=float, default=4.5, help='Render distance.')
+parser.add_argument('--view', type=int, default=6, choices=[4, 6], help='Number of input views.')
+parser.add_argument('--no_rembg', action='store_true', help='Do not remove input background.')
+parser.add_argument('--export_texmap', action='store_true', help='Export a mesh with texture map.')
+parser.add_argument('--save_video', action='store_true', help='Save a circular-view video.')
+args = parser.parse_args()
+seed_everything(args.seed)
+###############################################################################
+# Stage 0: Configuration.
+###############################################################################
+config = OmegaConf.load(args.config)
+config_name = os.path.basename(args.config).replace('.yaml', '')
+model_config = config.model_config
+infer_config = config.infer_config
+IS_FLEXICUBES = True if config_name.startswith('instant-mesh') else False
+device = torch.device('cuda')
+# load diffusion model
+print('Loading diffusion model ...')
+pipeline = DiffusionPipeline.from_pretrained(
+    "sudo-ai/zero123plus-v1.2",
+    custom_pipeline="zero123plus",
+    torch_dtype=torch.float16,
+)
+pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(
+    pipeline.scheduler.config, timestep_spacing='trailing'
+)
+# load custom white-background UNet
+print('Loading custom white-background unet ...')
+if os.path.exists(infer_config.unet_path):
+    unet_ckpt_path = infer_config.unet_path
+else:
+    unet_ckpt_path = hf_hub_download(repo_id="TencentARC/InstantMesh", filename="diffusion_pytorch_model.bin", repo_type="model")
+state_dict = torch.load(unet_ckpt_path, map_location='cpu')
+pipeline.unet.load_state_dict(state_dict, strict=True)
+pipeline = pipeline.to(device)
+# load reconstruction model
+print('Loading reconstruction model ...')
+model = instantiate_from_config(model_config)
+if os.path.exists(infer_config.model_path):
+    model_ckpt_path = infer_config.model_path
+else:
+    model_ckpt_path = hf_hub_download(repo_id="TencentARC/InstantMesh", filename=f"{config_name.replace('-', '_')}.ckpt", repo_type="model")
+state_dict = torch.load(model_ckpt_path, map_location='cpu')['state_dict']
+state_dict = {k[14:]: v for k, v in state_dict.items() if k.startswith('lrm_generator.')}
+model.load_state_dict(state_dict, strict=True)
+model = model.to(device)
+if IS_FLEXICUBES:
+    model.init_flexicubes_geometry(device, fovy=30.0)
+model = model.eval()
+# make output directories
+image_path = os.path.join(args.output_path, config_name, 'images')
+mesh_path = os.path.join(args.output_path, config_name, 'meshes')
+video_path = os.path.join(args.output_path, config_name, 'videos')
+os.makedirs(image_path, exist_ok=True)
+os.makedirs(mesh_path, exist_ok=True)
+os.makedirs(video_path, exist_ok=True)
+# process input files
+if os.path.isdir(args.input_path):
+    input_files = [
+        os.path.join(args.input_path, file)
+        for file in os.listdir(args.input_path)
+        if file.endswith('.png') or file.endswith('.jpg') or file.endswith('.webp')
+    ]
+else:
+    input_files = [args.input_path]
+print(f'Total number of input images: {len(input_files)}')
+###############################################################################
+# Stage 1: Multiview generation.
+###############################################################################
+rembg_session = None if args.no_rembg else rembg.new_session()
+outputs = []
+for idx, image_file in enumerate(input_files):
+    name = os.path.basename(image_file).split('.')[0]
+    print(f'[{idx+1}/{len(input_files)}] Imagining {name} ...')
+    # remove background optionally
+    input_image = Image.open(image_file)
+    if not args.no_rembg:
+        input_image = remove_background(input_image, rembg_session)
+        input_image = resize_foreground(input_image, 0.85)
+    # sampling
+    output_image = pipeline(
+        input_image,
+        num_inference_steps=args.diffusion_steps,
+    ).images[0]
+    output_image.save(os.path.join(image_path, f'{name}.png'))
+    print(f"Image saved to {os.path.join(image_path, f'{name}.png')}")
+    images = np.asarray(output_image, dtype=np.float32) / 255.0
+    images = torch.from_numpy(images).permute(2, 0, 1).contiguous().float()     # (3, 960, 640)
+    images = rearrange(images, 'c (n h) (m w) -> (n m) c h w', n=3, m=2)        # (6, 3, 320, 320)
+    outputs.append({'name': name, 'images': images})
+# delete pipeline to save memory
+del pipeline
+###############################################################################
+# Stage 2: Reconstruction.
+###############################################################################
+input_cameras = get_zero123plus_input_cameras(batch_size=1, radius=4.0*args.scale).to(device)
+chunk_size = 20 if IS_FLEXICUBES else 1
+for idx, sample in enumerate(outputs):
+    name = sample['name']
+    print(f'[{idx+1}/{len(outputs)}] Creating {name} ...')
+    images = sample['images'].unsqueeze(0).to(device)
+    images = v2.functional.resize(images, 320, interpolation=3, antialias=True).clamp(0, 1)
+    if args.view == 4:
+        indices = torch.tensor([0, 2, 4, 5]).long().to(device)
+        images = images[:, indices]
+        input_cameras = input_cameras[:, indices]
+    with torch.no_grad():
+        # get triplane
+        planes = model.forward_planes(images, input_cameras)
+        # get mesh
+        mesh_path_idx = os.path.join(mesh_path, f'{name}.obj')
+        mesh_out = model.extract_mesh(
+            planes,
+            use_texture_map=args.export_texmap,
+            **infer_config,
+        )
+        if args.export_texmap:
+            vertices, faces, uvs, mesh_tex_idx, tex_map = mesh_out
+            save_obj_with_mtl(
+                vertices.data.cpu().numpy(),
+                uvs.data.cpu().numpy(),
+                faces.data.cpu().numpy(),
+                mesh_tex_idx.data.cpu().numpy(),
+                tex_map.permute(1, 2, 0).data.cpu().numpy(),
+                mesh_path_idx,
+            )
+        else:
+            vertices, faces, vertex_colors = mesh_out
+            save_obj(vertices, faces, vertex_colors, mesh_path_idx)
+        print(f"Mesh saved to {mesh_path_idx}")
+        # get video
+        if args.save_video:
+            video_path_idx = os.path.join(video_path, f'{name}.mp4')
+            render_size = infer_config.render_resolution
+            render_cameras = get_render_cameras(
+                batch_size=1,
+                M=120,
+                radius=args.distance,
+                elevation=20.0,
+                is_flexicubes=IS_FLEXICUBES,
+            ).to(device)
+            frames = render_frames(
+                model,
+                planes,
+                render_cameras=render_cameras,
+                render_size=render_size,
+                chunk_size=chunk_size,
+                is_flexicubes=IS_FLEXICUBES,
+            )
+            save_video(
+                frames,
+                video_path_idx,
+                fps=30,
+            )
+            print(f"Video saved to {video_path_idx}")

train.py ADDED Viewed

	@@ -0,0 +1,286 @@

+import os, sys
+import argparse
+import shutil
+import subprocess
+from omegaconf import OmegaConf
+from pytorch_lightning import seed_everything
+from pytorch_lightning.trainer import Trainer
+from pytorch_lightning.strategies import DDPStrategy
+from pytorch_lightning.callbacks import Callback
+from pytorch_lightning.utilities import rank_zero_only, rank_zero_warn
+from src.utils.train_util import instantiate_from_config
+@rank_zero_only
+def rank_zero_print(*args):
+    print(*args)
+def get_parser(**parser_kwargs):
+    def str2bool(v):
+        if isinstance(v, bool):
+            return v
+        if v.lower() in ("yes", "true", "t", "y", "1"):
+            return True
+        elif v.lower() in ("no", "false", "f", "n", "0"):
+            return False
+        else:
+            raise argparse.ArgumentTypeError("Boolean value expected.")
+    parser = argparse.ArgumentParser(**parser_kwargs)
+    parser.add_argument(
+        "-r",
+        "--resume",
+        type=str,
+        default=None,
+        help="resume from checkpoint",
+    )
+    parser.add_argument(
+        "--resume_weights_only",
+        action="store_true",
+        help="only resume model weights",
+    )
+    parser.add_argument(
+        "-b",
+        "--base",
+        type=str,
+        default="base_config.yaml",
+        help="path to base configs",
+    )
+    parser.add_argument(
+        "-n",
+        "--name",
+        type=str,
+        default="",
+        help="experiment name",
+    )
+    parser.add_argument(
+        "--num_nodes",
+        type=int,
+        default=1,
+        help="number of nodes to use",
+    )
+    parser.add_argument(
+        "--gpus",
+        type=str,
+        default="0,",
+        help="gpu ids to use",
+    )
+    parser.add_argument(
+        "-s",
+        "--seed",
+        type=int,
+        default=42,
+        help="seed for seed_everything",
+    )
+    parser.add_argument(
+        "-l",
+        "--logdir",
+        type=str,
+        default="logs",
+        help="directory for logging data",
+    )
+    return parser
+class SetupCallback(Callback):
+    def __init__(self, resume, logdir, ckptdir, cfgdir, config):
+        super().__init__()
+        self.resume = resume
+        self.logdir = logdir
+        self.ckptdir = ckptdir
+        self.cfgdir = cfgdir
+        self.config = config
+    def on_fit_start(self, trainer, pl_module):
+        if trainer.global_rank == 0:
+            # Create logdirs and save configs
+            os.makedirs(self.logdir, exist_ok=True)
+            os.makedirs(self.ckptdir, exist_ok=True)
+            os.makedirs(self.cfgdir, exist_ok=True)
+            rank_zero_print("Project config")
+            rank_zero_print(OmegaConf.to_yaml(self.config))
+            OmegaConf.save(self.config,
+                           os.path.join(self.cfgdir, "project.yaml"))
+class CodeSnapshot(Callback):
+    """
+    Modified from https://github.com/threestudio-project/threestudio/blob/main/threestudio/utils/callbacks.py#L60
+    """
+    def __init__(self, savedir):
+        self.savedir = savedir
+    def get_file_list(self):
+        return [
+            b.decode()
+            for b in set(
+                subprocess.check_output(
+                    'git ls-files -- ":!:configs/*"', shell=True
+                ).splitlines()
+            )
+            | set(  # hard code, TODO: use config to exclude folders or files
+                subprocess.check_output(
+                    "git ls-files --others --exclude-standard", shell=True
+                ).splitlines()
+            )
+        ]
+    @rank_zero_only
+    def save_code_snapshot(self):
+        os.makedirs(self.savedir, exist_ok=True)
+        for f in self.get_file_list():
+            if not os.path.exists(f) or os.path.isdir(f):
+                continue
+            os.makedirs(os.path.join(self.savedir, os.path.dirname(f)), exist_ok=True)
+            shutil.copyfile(f, os.path.join(self.savedir, f))
+    def on_fit_start(self, trainer, pl_module):
+        try:
+            self.save_code_snapshot()
+        except:
+            rank_zero_warn(
+                "Code snapshot is not saved. Please make sure you have git installed and are in a git repository."
+            )
+if __name__ == "__main__":
+    # add cwd for convenience and to make classes in this file available when
+    # running as `python main.py`
+    sys.path.append(os.getcwd())
+    parser = get_parser()
+    opt, unknown = parser.parse_known_args()
+    cfg_fname = os.path.split(opt.base)[-1]
+    cfg_name = os.path.splitext(cfg_fname)[0]
+    exp_name = "-" + opt.name if opt.name != "" else ""
+    logdir = os.path.join(opt.logdir, cfg_name+exp_name)
+    ckptdir = os.path.join(logdir, "checkpoints")
+    cfgdir = os.path.join(logdir, "configs")
+    codedir = os.path.join(logdir, "code")
+    seed_everything(opt.seed)
+    # init configs
+    config = OmegaConf.load(opt.base)
+    lightning_config = config.lightning
+    trainer_config = lightning_config.trainer
+    trainer_config["accelerator"] = "gpu"
+    rank_zero_print(f"Running on GPUs {opt.gpus}")
+    ngpu = len(opt.gpus.strip(",").split(','))
+    trainer_config['devices'] = ngpu
+    trainer_opt = argparse.Namespace(**trainer_config)
+    lightning_config.trainer = trainer_config
+    # model
+    model = instantiate_from_config(config.model)
+    if opt.resume and opt.resume_weights_only:
+        model = model.__class__.load_from_checkpoint(opt.resume, **config.model.params)
+    model.logdir = logdir
+    # trainer and callbacks
+    trainer_kwargs = dict()
+    # logger
+    default_logger_cfg = {
+        "target": "pytorch_lightning.loggers.TensorBoardLogger",
+        "params": {
+            "name": "tensorboard",
+            "save_dir": logdir,
+            "version": "0",
+        }
+    }
+    logger_cfg = OmegaConf.merge(default_logger_cfg)
+    trainer_kwargs["logger"] = instantiate_from_config(logger_cfg)
+    # model checkpoint
+    default_modelckpt_cfg = {
+        "target": "pytorch_lightning.callbacks.ModelCheckpoint",
+        "params": {
+            "dirpath": ckptdir,
+            "filename": "{step:08}",
+            "verbose": True,
+            "save_last": True,
+            "every_n_train_steps": 5000,
+            "save_top_k": -1,   # save all checkpoints
+        }
+    }
+    if "modelcheckpoint" in lightning_config:
+        modelckpt_cfg = lightning_config.modelcheckpoint
+    else:
+        modelckpt_cfg = OmegaConf.create()
+    modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)
+    # callbacks
+    default_callbacks_cfg = {
+        "setup_callback": {
+            "target": "train.SetupCallback",
+            "params": {
+                "resume": opt.resume,
+                "logdir": logdir,
+                "ckptdir": ckptdir,
+                "cfgdir": cfgdir,
+                "config": config,
+            }
+        },
+        "learning_rate_logger": {
+            "target": "pytorch_lightning.callbacks.LearningRateMonitor",
+            "params": {
+                "logging_interval": "step",
+            }
+        },
+        "code_snapshot": {
+            "target": "train.CodeSnapshot",
+            "params": {
+                "savedir": codedir,
+            }
+        },
+    }
+    default_callbacks_cfg["checkpoint_callback"] = modelckpt_cfg
+    if "callbacks" in lightning_config:
+        callbacks_cfg = lightning_config.callbacks
+    else:
+        callbacks_cfg = OmegaConf.create()
+    callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)
+    trainer_kwargs["callbacks"] = [
+        instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]
+    trainer_kwargs['precision'] = '32-true'
+    trainer_kwargs["strategy"] = DDPStrategy(find_unused_parameters=True)
+    # trainer
+    trainer = Trainer(**trainer_config, **trainer_kwargs, num_nodes=opt.num_nodes)
+    trainer.logdir = logdir
+    # data
+    data = instantiate_from_config(config.data)
+    data.prepare_data()
+    data.setup("fit")
+    # configure learning rate
+    base_lr = config.model.base_learning_rate
+    if 'accumulate_grad_batches' in lightning_config.trainer:
+        accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches
+    else:
+        accumulate_grad_batches = 1
+    rank_zero_print(f"accumulate_grad_batches = {accumulate_grad_batches}")
+    lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches
+    model.learning_rate = base_lr
+    rank_zero_print("++++ NOT USING LR SCALING ++++")
+    rank_zero_print(f"Setting learning rate to {model.learning_rate:.2e}")
+    # run training loop
+    if opt.resume and not opt.resume_weights_only:
+        trainer.fit(model, data, ckpt_path=opt.resume)
+    else:
+        trainer.fit(model, data)