Spaces:
Running
on
L40S
Running
on
L40S
File size: 8,782 Bytes
68cd723 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
[English](README.md) | [简体中文](README_zh_cn.md)
<!-- ## **Hunyuan3D-1.0** -->
<p align="center">
<img src="./assets/logo.png" height=200>
</p>
# Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
<div align="center">
<a href="https://github.com/tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue&logo=github-pages"></a>  
<a href="https://3d.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Homepage&message=Tencent%20Hunyuan3D&color=blue&logo=github-pages"></a>  
<a href="https://arxiv.org/pdf/2411.02293"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red&logo=arxiv"></a>  
<a href="https://huggingface.co/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Checkpoints&message=HuggingFace&color=yellow"></a>  
<a href="https://huggingface.co/spaces/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Demo&message=HuggingFace&color=yellow"></a>  
</div>
## 🔥🔥🔥 更新!!
* Nov 5, 2024: 💬 已经支持图生3D。请在[script](#using-gradio)体验。
* Nov 5, 2024: 💬 已经支持文生3D,请在[script](#using-gradio)体验。
## 📑 开源计划
- [x] Inference
- [x] Checkpoints
- [ ] Baking related
- [ ] Training
- [ ] ComfyUI
- [ ] Distillation Version
- [ ] TensorRT Version
## **概要**
<p align="center">
<img src="./assets/teaser.png" height=450>
</p>
为了解决现有的3D生成模型在生成速度和泛化能力上存在不足,我们开源了混元3D-1.0模型,可以帮助3D创作者和艺术家自动化生产3D资产。我们的模型采用两阶段生成方法,在保证质量和可控的基础上,仅需10秒即可生成3D资产。在第一阶段,我们采用了一种多视角扩散模型,轻量版模型能够在大约4秒内高效生成多视角图像,这些多视角图像从不同的视角捕捉了3D资产的丰富的纹理和几何先验,将任务从单视角重建松弛到多视角重建。在第二阶段,我们引入了一种前馈重建模型,利用上一阶段生成的多视角图像。该模型能够在大约3秒内快速而准确地重建3D资产。重建模型学习处理多视角扩散引入的噪声和不一致性,并利用条件图像中的可用信息高效恢复3D结构。最终,该模型可以实现输入任意单视角实现三维生成。
## 🎉 **Hunyuan3D-1.0 模型架构**
<p align="center">
<img src="./assets/overview_3.png" height=400>
</p>
## 📈 比较
通过和其他开源模型比较, 混元3D-1.0在5项指标都得到了最高用户评分。细节请查看以下用户研究结果。
在A100显卡上,轻量版模型仅需10s即可完成单图生成3D,标准版则大约需要25s。以下散点图表明腾讯混元3D-1.0实现了质量和速度的合理平衡。
<p align="center">
<img src="./assets/radar.png" height=300>
<img src="./assets/runtime.png" height=300>
</p>
## 使用
#### 复制代码仓库
```shell
git clone https://github.com/tencent/Hunyuan3D-1
cd Hunyuan3D-1
```
#### Linux系统安装
env_install.sh 脚本提供了如何安装环境:
```
# 第一步:创建环境
conda create -n hunyuan3d-1 python=3.9 or 3.10 or 3.11 or 3.12
conda activate hunyuan3d-1
# 第二部:安装torch和相关依赖包
which pip # check pip corresponds to python
# modify the cuda version according to your machine (recommended)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# 第三步:安装其他相关依赖包
bash env_install.sh
```
由于dust3r的许可证限制, 我们仅提供其安装途径:
```
cd third_party
git clone --recursive https://github.com/naver/dust3r.git
cd ../third_party/weights
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
```
<details>
<summary>💡一些环境安装建议</summary>
可以选择安装 xformers 或 flash_attn 进行加速:
```
pip install xformers --index-url https://download.pytorch.org/whl/cu121
```
```
pip install flash_attn
```
Most environment errors are caused by a mismatch between machine and packages. You can try manually specifying the version, as shown in the following successful cases:
```
# python3.9
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
```
when install pytorch3d, the gcc version is preferably greater than 9, and the gpu driver should not be too old.
</details>
#### 下载预训练模型
模型下载链接 [https://huggingface.co/tencent/Hunyuan3D-1](https://huggingface.co/tencent/Hunyuan3D-1):
+ `Hunyuan3D-1/lite`, lite model for multi-view generation.
+ `Hunyuan3D-1/std`, standard model for multi-view generation.
+ `Hunyuan3D-1/svrm`, sparse-view reconstruction model.
为了通过Hugging Face下载模型,请先下载 huggingface-cli. (安装细节可见 [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
```shell
python3 -m pip install "huggingface_hub[cli]"
```
请使用以下命令下载模型:
```shell
mkdir weights
huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights
mkdir weights/hunyuanDiT
huggingface-cli download Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled --local-dir ./weights/hunyuanDiT
```
#### 推理
对于文生3D,我们支持中/英双语生成,请使用以下命令进行本地推理:
```python
python3 main.py \
--text_prompt "a lovely rabbit" \
--save_folder ./outputs/test/ \
--max_faces_num 90000 \
--do_texture_mapping \
--do_render
```
对于图生3D,请使用以下命令进行本地推理:
```python
python3 main.py \
--image_prompt "/path/to/your/image" \
--save_folder ./outputs/test/ \
--max_faces_num 90000 \
--do_texture_mapping \
--do_render
```
更多参数详解:
| Argument | Default | Description |
|:------------------:|:---------:|:---------------------------------------------------:|
|`--text_prompt` | None |The text prompt for 3D generation |
|`--image_prompt` | None |The image prompt for 3D generation |
|`--t2i_seed` | 0 |The random seed for generating images |
|`--t2i_steps` | 25 |The number of steps for sampling of text to image |
|`--gen_seed` | 0 |The random seed for generating 3d generation |
|`--gen_steps` | 50 |The number of steps for sampling of 3d generation |
|`--max_faces_numm` | 90000 |The limit number of faces of 3d mesh |
|`--save_memory` | False |module will move to cpu automatically|
|`--do_texture_mapping` | False |Change vertex shadding to texture shading |
|`--do_render` | False |render gif |
如果显卡内存有限,可以使用`--save_memory`命令,最低显卡内存要求如下:
- Inference Std-pipeline requires 30GB VRAM (24G VRAM with --save_memory).
- Inference Lite-pipeline requires 22GB VRAM (18G VRAM with --save_memory).
- Note: --save_memory will increase inference time
```bash
bash scripts/text_to_3d_std.sh
bash scripts/text_to_3d_lite.sh
bash scripts/image_to_3d_std.sh
bash scripts/image_to_3d_lite.sh
```
如果你的显卡内存为16G,可以分别加载模型到显卡:
```bash
bash scripts/text_to_3d_std_separately.sh 'a lovely rabbit' ./outputs/test # >= 16G
bash scripts/text_to_3d_lite_separately.sh 'a lovely rabbit' ./outputs/test # >= 14G
bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/test # >= 16G
bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G
```
#### Gradio界面部署
我们分别提供轻量版和标准版界面:
```shell
# std
python3 app.py
python3 app.py --save_memory
# lite
python3 app.py --use_lite
python3 app.py --use_lite --save_memory
```
Gradio界面体验地址为 http://0.0.0.0:8080. 这里 0.0.0.0 应当填写运行模型的机器IP地址。
## 相机参数
生成多视图视角固定为
+ Azimuth (relative to input view): `+0, +60, +120, +180, +240, +300`.
## 引用
如果我们的仓库对您有帮助,请引用我们的工作
```bibtex
@misc{yang2024tencent,
title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
year={2024},
eprint={2411.02293},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
|