HunyuanDiT
Diffusers
Safetensors
English
Chinese
Zhiminli commited on
Commit
8f509dc
·
verified ·
1 Parent(s): 9993e3b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +307 -40
README.md CHANGED
@@ -14,22 +14,29 @@ language:
14
  # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
15
 
16
 
17
-
18
  This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
19
 
20
- > [**Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
21
- > Zhimin Li*, Jianwei Zhang*, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu, Zheng Fang, Weiyan Wang, Jinbao Xue, Yangyu Tao, JianChen Zhu, Kai Liu, Sihuan Lin, Yifu Sun, Yun Li, Dongdong Wang, Zhichao Hu, Xiao Xiao, Yan Chen, Yuhong Liu, Wei Liu, Di Wang, Yong Yang, Jie Jiang, Qinglin Lu‡
22
- > <br>Tencent Hunyuan<br>
23
 
24
- > [**DialogGen:Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation**](https://arxiv.org/abs/2403.08857)<br>
25
- > Minbin Huang*, Yanxin Long*, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu&#8224;, Wei Liu
26
- > <br>Chinese University of Hong Kong, Tencent Hunyuan, Shenzhen Campus of Sun Yat-sen University<br>
 
 
 
 
 
 
 
27
 
 
28
 
 
29
 
30
- ## 🔥🔥🔥 Tencent Hunyuan Bot
31
 
32
- Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. You can use **simple prompts** as well as **multi-turn language interactions** to create the picture. Unleash your creativity and create any picture you desire, **all for free!**
33
  > 画一只穿着西装的猪
34
  >
35
  > draw a pig in a suit
@@ -38,18 +45,38 @@ Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where yo
38
  >
39
  > generate a painting, cyberpunk style, sports car
40
 
 
 
 
 
 
 
 
 
 
 
41
  ## 📑 Open-source Plan
42
 
43
  - Hunyuan-DiT (Text-to-Image Model)
44
  - [x] Inference
45
  - [x] Checkpoints
46
- - [ ] Distillation Version (Coming soon ⏩️)
47
- - [ ] TensorRT Version (Coming soon ⏩️)
48
- - [ ] Training (Coming later ⏩️)
 
 
 
 
 
49
  - [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
50
- - [x] Inference
51
  - [X] Web Demo (Gradio)
 
52
  - [X] Cli Demo
 
 
 
 
53
 
54
  ## Contents
55
  - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
@@ -62,10 +89,17 @@ Welcome to [Tencent Hunyuan Bot](https://hunyuan.tencent.com/bot/chat), where yo
62
  - [📜 Requirements](#-requirements)
63
  - [🛠 Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
64
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
 
 
 
 
65
  - [🔑 Inference](#-inference)
66
  - [Using Gradio](#using-gradio)
 
67
  - [Using Command Line](#using-command-line)
68
  - [More Configurations](#more-configurations)
 
 
69
  - [🔗 BibTeX](#-bibtex)
70
 
71
  ## **Abstract**
@@ -145,7 +179,7 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
145
 
146
  * **Multi-turn Text2Image Generation**
147
 
148
- [demo video](https://youtu.be/4AaHrYnuIcE)
149
 
150
 
151
 
@@ -155,15 +189,14 @@ In order to comprehensively compare the generation capabilities of HunyuanDiT an
155
 
156
  This repo consists of DialogGen (a prompt enhancement model) and Hunyuan-DiT (a text-to-image model).
157
 
158
- The following table shows the requirements for running the models (The TensorRT version will be updated soon):
159
 
160
- | Model | TensorRT | Batch Size | GPU Memory | GPU |
161
- |:------------------------:|:--------:|:----------:|:----------:|:---------:|
162
- | DialogGen + Hunyuan-DiT | | 1 | 32G | V100/A100 |
163
- | Hunyuan-DiT || 1 | 11G | V100/A100 |
164
-
165
- <!-- | DialogGen + Hunyuan-DiT || 1 | ? | A100 |
166
- | Hunyuan-DiT | ✔ | 1 | ? | A100 | -->
167
 
168
  * An NVIDIA GPU with CUDA support is required.
169
  * We have tested V100 and A100 GPUs.
@@ -174,15 +207,17 @@ The following table shows the requirements for running the models (The TensorRT
174
  ## 🛠️ Dependencies and Installation
175
 
176
  Begin by cloning the repository:
177
- ```bash
178
  git clone https://github.com/tencent/HunyuanDiT
179
  cd HunyuanDiT
180
  ```
181
 
 
 
182
  We provide an `environment.yml` file for setting up a Conda environment.
183
  Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
184
 
185
- ```bash
186
  # 1. Prepare conda environment
187
  conda env create -f environment.yml
188
 
@@ -199,37 +234,158 @@ python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.
199
  ## 🧱 Download Pretrained Models
200
  To download the model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
201
 
202
- ```bash
203
  python -m pip install "huggingface_hub[cli]"
204
  ```
205
 
206
  Then download the model using the following commands:
207
 
208
- ```bash
209
  # Create a directory named 'ckpts' where the model will be saved, fulfilling the prerequisites for running the demo.
210
  mkdir ckpts
211
  # Use the huggingface-cli tool to download the model.
212
  # The download time may vary from 10 minutes to 1 hour depending on network conditions.
213
  huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
214
  ```
215
- Note:If an `No such file or directory: 'ckpts/.huggingface/.gitignore.lock'` like error occurs during the download process, you can ignore the error and retry the command by executing `huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
216
 
217
  All models will be automatically downloaded. For more information about the model, visit the Hugging Face repository [here](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
218
 
219
- | Model | #Params | Download URL |
220
- |:------------------:|:-------:|:-------------------------------------------------------------------------------------------------------:|
221
- | mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) |
222
- | CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) |
223
- | DialogGen | 7.0B | [DialogGen](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) |
224
- | sdxl-vae-fp16-fix | 83M | [sdxl-vae-fp16-fix](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix) |
225
- | Hunyuan-DiT | 1.5B | [Hunyuan-DiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
 
227
 
228
  ## 🔑 Inference
229
 
230
  ### Using Gradio
231
 
232
- Make sure you have activated the conda environment before running the following command.
233
 
234
  ```shell
235
  # By default, we start a Chinese UI.
@@ -244,13 +400,61 @@ python app/hydit_app.py --no-enhance
244
 
245
  # Start with English UI
246
  python app/hydit_app.py --lang en
 
 
 
 
247
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
 
249
  ### Using Command Line
250
 
251
- We provide 3 modes to quick start:
252
 
253
- ```bash
254
  # Prompt Enhancement + Text-to-Image. Torch mode
255
  python sample_t2i.py --prompt "渔舟唱晚"
256
 
@@ -262,6 +466,10 @@ python sample_t2i.py --infer-mode fa --prompt "渔舟唱晚"
262
 
263
  # Generate an image with other image sizes.
264
  python sample_t2i.py --prompt "渔舟唱晚" --image-size 1280 768
 
 
 
 
265
  ```
266
 
267
  More example prompts can be found in [example_prompts.txt](example_prompts.txt)
@@ -277,14 +485,63 @@ We list some more useful configurations for easy usage:
277
  | `--seed` | 42 | The random seed for generating images |
278
  | `--infer-steps` | 100 | The number of steps for sampling |
279
  | `--negative` | - | The negative prompt for image generation |
280
- | `--infer-mode` | torch | The inference mode (torch or fa) |
281
  | `--sampler` | ddpm | The diffusion sampler (ddpm, ddim, or dpmms) |
282
  | `--no-enhance` | False | Disable the prompt enhancement model |
283
  | `--model-root` | ckpts | The root directory of the model checkpoints |
284
  | `--load-key` | ema | Load the student model or EMA model (ema or module) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
285
 
 
 
 
 
286
 
287
- # 🔗 BibTeX
 
 
 
 
 
 
 
 
 
 
 
 
288
  If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
289
 
290
  ```BibTeX
@@ -303,4 +560,14 @@ If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https
303
  journal={arXiv preprint arXiv:2403.08857},
304
  year={2024}
305
  }
306
- ```
 
 
 
 
 
 
 
 
 
 
 
14
  # Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
15
 
16
 
 
17
  This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our [project page](https://dit.hunyuan.tencent.com/).
18
 
19
+ > [**Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding**](https://arxiv.org/abs/2405.08748) <br>
20
+
21
+ > [**DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation**](https://arxiv.org/abs/2403.08857) <br>
22
 
23
+ ## 🔥🔥🔥 News!!
24
+ * Jun 13, 2024: :zap: HYDiT-v1.1 version is released, which mitigates the issue of image oversaturation and alleviates the watermark issue. Please check [HunyuanDiT-v1.1 ](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1) and
25
+ [Distillation-v1.1](https://huggingface.co/Tencent-Hunyuan/Distillation-v1.1) for more details.
26
+ * Jun 13, 2024: :truck: The training code is released, offering [full-parameter training](#full-parameter-training) and [LoRA training](#lora).
27
+ * Jun 06, 2024: :tada: Hunyuan-DiT is now available in ComfyUI. Please check [ComfyUI](#using-comfyui) for more details.
28
+ * Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves **50%** acceleration on NVIDIA GPUs. Please check [Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details.
29
+ * Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the [example](#using--diffusers) below.
30
+ * Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the [links](#-download-pretrained-models) below.
31
+ * May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves **47%** acceleration on NVIDIA GPUs. Please check [TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for instructions.
32
+ * May 22, 2024: 💬 We support demo running multi-turn text2image generation now. Please check the [script](#using-gradio) below.
33
 
34
+ ## 🤖 Try it on the web
35
 
36
+ Welcome to our web-based [**Tencent Hunyuan Bot**](https://hunyuan.tencent.com/bot/chat), where you can explore our innovative products! Just input the suggested prompts below or any other **imaginative prompts containing drawing-related keywords** to activate the Hunyuan text-to-image generation feature. Unleash your creativity and create any picture you desire, **all for free!**
37
 
38
+ You can use simple prompts similar to natural language text
39
 
 
40
  > 画一只穿着西装的猪
41
  >
42
  > draw a pig in a suit
 
45
  >
46
  > generate a painting, cyberpunk style, sports car
47
 
48
+ or multi-turn language interactions to create the picture.
49
+
50
+ > 画一个木制的鸟
51
+ >
52
+ > draw a wooden bird
53
+ >
54
+ > 变成玻璃的
55
+ >
56
+ > turn into glass
57
+
58
  ## 📑 Open-source Plan
59
 
60
  - Hunyuan-DiT (Text-to-Image Model)
61
  - [x] Inference
62
  - [x] Checkpoints
63
+ - [x] Distillation Version
64
+ - [x] TensorRT Version
65
+ - [x] Training
66
+ - [x] Lora
67
+ - [ ] Controlnet (Pose, Canny, Depth, Tile)
68
+ - [ ] IP-adapter
69
+ - [ ] Hunyuan-DiT-XL checkpoints (0.7B model)
70
+ - [ ] Caption model (Re-caption the raw image-text pairs)
71
  - [DialogGen](https://github.com/Centaurusalpha/DialogGen) (Prompt Enhancement Model)
72
+ - [x] Inference
73
  - [X] Web Demo (Gradio)
74
+ - [x] Multi-turn T2I Demo (Gradio)
75
  - [X] Cli Demo
76
+ - [X] ComfyUI
77
+ - [X] Diffusers
78
+ - [ ] WebUI
79
+
80
 
81
  ## Contents
82
  - [Hunyuan-DiT](#hunyuan-dit--a-powerful-multi-resolution-diffusion-transformer-with-fine-grained-chinese-understanding)
 
89
  - [📜 Requirements](#-requirements)
90
  - [🛠 Dependencies and Installation](#%EF%B8%8F-dependencies-and-installation)
91
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
92
+ - [:truck: Training](#truck-training)
93
+ - [Data Preparation](#data-preparation)
94
+ - [Full Parameter Training](#full-parameter-training)
95
+ - [LoRA](#lora)
96
  - [🔑 Inference](#-inference)
97
  - [Using Gradio](#using-gradio)
98
+ - [Using Diffusers](#using--diffusers)
99
  - [Using Command Line](#using-command-line)
100
  - [More Configurations](#more-configurations)
101
+ - [Using ComfyUI](#using-comfyui)
102
+ - [🚀 Acceleration (for Linux)](#-acceleration-for-linux)
103
  - [🔗 BibTeX](#-bibtex)
104
 
105
  ## **Abstract**
 
179
 
180
  * **Multi-turn Text2Image Generation**
181
 
182
+ https://github.com/Tencent/tencent.github.io/assets/27557933/94b4dcc3-104d-44e1-8bb2-dc55108763d1
183
 
184
 
185
 
 
189
 
190
  This repo consists of DialogGen (a prompt enhancement model) and Hunyuan-DiT (a text-to-image model).
191
 
192
+ The following table shows the requirements for running the models (batch size = 1):
193
 
194
+ | Model | --load-4bit (DialogGen) | GPU Peak Memory | GPU |
195
+ |:-----------------------:|:-----------------------:|:---------------:|:---------------:|
196
+ | DialogGen + Hunyuan-DiT | | 32G | A100 |
197
+ | DialogGen + Hunyuan-DiT || 22G | A100 |
198
+ | Hunyuan-DiT | - | 11G | A100 |
199
+ | Hunyuan-DiT | - | 14G | RTX3090/RTX4090 |
 
200
 
201
  * An NVIDIA GPU with CUDA support is required.
202
  * We have tested V100 and A100 GPUs.
 
207
  ## 🛠️ Dependencies and Installation
208
 
209
  Begin by cloning the repository:
210
+ ```shell
211
  git clone https://github.com/tencent/HunyuanDiT
212
  cd HunyuanDiT
213
  ```
214
 
215
+ ### Installation Guide for Linux
216
+
217
  We provide an `environment.yml` file for setting up a Conda environment.
218
  Conda's installation instructions are available [here](https://docs.anaconda.com/free/miniconda/index.html).
219
 
220
+ ```shell
221
  # 1. Prepare conda environment
222
  conda env create -f environment.yml
223
 
 
234
  ## 🧱 Download Pretrained Models
235
  To download the model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
236
 
237
+ ```shell
238
  python -m pip install "huggingface_hub[cli]"
239
  ```
240
 
241
  Then download the model using the following commands:
242
 
243
+ ```shell
244
  # Create a directory named 'ckpts' where the model will be saved, fulfilling the prerequisites for running the demo.
245
  mkdir ckpts
246
  # Use the huggingface-cli tool to download the model.
247
  # The download time may vary from 10 minutes to 1 hour depending on network conditions.
248
  huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
249
  ```
250
+
251
+ <details>
252
+ <summary>💡Tips for using huggingface-cli (network problem)</summary>
253
+
254
+ ##### 1. Using HF-Mirror
255
+
256
+ If you encounter slow download speeds in China, you can try a mirror to speed up the download process. For example,
257
+
258
+ ```shell
259
+ HF_ENDPOINT=https://hf-mirror.com huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
260
+ ```
261
+
262
+ ##### 2. Resume Download
263
+
264
+ `huggingface-cli` supports resuming downloads. If the download is interrupted, you can just rerun the download
265
+ command to resume the download process.
266
+
267
+ Note: If an `No such file or directory: 'ckpts/.huggingface/.gitignore.lock'` like error occurs during the download
268
+ process, you can ignore the error and rerun the download command.
269
+
270
+ </details>
271
+
272
+ ---
273
 
274
  All models will be automatically downloaded. For more information about the model, visit the Hugging Face repository [here](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).
275
 
276
+ | Model | #Params | Huggingface Download URL | Tencent Cloud Download URL |
277
+ |:------------------:|:-------:|:-------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------:|
278
+ | mT5 | 1.6B | [mT5](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/mt5) | [mT5](https://dit.hunyuan.tencent.com/download/HunyuanDiT/mt5.zip) |
279
+ | CLIP | 350M | [CLIP](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/clip_text_encoder) | [CLIP](https://dit.hunyuan.tencent.com/download/HunyuanDiT/clip_text_encoder.zip) |
280
+ | Tokenizer | - | [Tokenizer](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/tokenizer) | [Tokenizer](https://dit.hunyuan.tencent.com/download/HunyuanDiT/tokenizer.zip) |
281
+ | DialogGen | 7.0B | [DialogGen](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/dialoggen) | [DialogGen](https://dit.hunyuan.tencent.com/download/HunyuanDiT/dialoggen.zip) |
282
+ | sdxl-vae-fp16-fix | 83M | [sdxl-vae-fp16-fix](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/sdxl-vae-fp16-fix) | [sdxl-vae-fp16-fix](https://dit.hunyuan.tencent.com/download/HunyuanDiT/sdxl-vae-fp16-fix.zip) |
283
+ | Hunyuan-DiT | 1.5B | [Hunyuan-DiT](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/tree/main/t2i/model) | [Hunyuan-DiT](https://dit.hunyuan.tencent.com/download/HunyuanDiT/model.zip) |
284
+ | Data demo | - | - | [Data demo](https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip) |
285
+
286
+ ## :truck: Training
287
+
288
+ ### Data Preparation
289
+
290
+ Refer to the commands below to prepare the training data.
291
+
292
+ 1. Install dependencies
293
+
294
+ We offer an efficient data management library, named IndexKits, supporting the management of reading hundreds of millions of data during training, see more in [docs](./IndexKits/README.md).
295
+ ```shell
296
+ # 1 Install dependencies
297
+ cd HunyuanDiT
298
+ pip install -e ./IndexKits
299
+ ```
300
+ 2. Data download
301
+
302
+ Feel free to download the [data demo](https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip).
303
+ ```shell
304
+ # 2 Data download
305
+ wget -O ./dataset/data_demo.zip https://dit.hunyuan.tencent.com/download/HunyuanDiT/data_demo.zip
306
+ unzip ./dataset/data_demo.zip -d ./dataset
307
+ mkdir ./dataset/porcelain/arrows ./dataset/porcelain/jsons
308
+ ```
309
+ 3. Data conversion
310
+
311
+ Create a CSV file for training data with the fields listed in the table below.
312
+
313
+ | Fields | Required | Description | Example |
314
+ |:---------------:| :------: |:----------------:|:-----------:|
315
+ | `image_path` | Required | image path | `./dataset/porcelain/images/0.png` |
316
+ | `text_zh` | Required | text | 青花瓷风格,一只蓝色的鸟儿站在蓝色的花瓶上,周围点缀着白色花朵,背景是白色 |
317
+ | `md5` | Optional | image md5 (Message Digest Algorithm 5) | `d41d8cd98f00b204e9800998ecf8427e` |
318
+ | `width` | Optional | image width | `1024 ` |
319
+ | `height` | Optional | image height | ` 1024 ` |
320
+
321
+ > ⚠️ Optional fields like MD5, width, and height can be omitted. If omitted, the script below will automatically calculate them. This process can be time-consuming when dealing with large-scale training data.
322
+
323
+ We utilize [Arrow](https://github.com/apache/arrow) for training data format, offering a standard and efficient in-memory data representation. A conversion script is provided to transform CSV files into Arrow format.
324
+ ```shell
325
+ # 3 Data conversion
326
+ python ./hydit/data_loader/csv2arrow.py ./dataset/porcelain/csvfile/image_text.csv ./dataset/porcelain/arrows
327
+ ```
328
+
329
+ 4. Data Selection and Configuration File Creation
330
+
331
+ We configure the training data through YAML files. In these files, you can set up standard data processing strategies for filtering, copying, deduplicating, and more regarding the training data. For more details, see [docs](IndexKits/docs/MakeDataset.md).
332
+
333
+ For a sample file, please refer to [file](./dataset/yamls/porcelain.yaml). For a full parameter configuration file, see [file](./IndexKits/docs/MakeDataset.md).
334
+
335
+
336
+ 5. Create training data index file using YAML file.
337
+
338
+ ```shell
339
+ # Single Resolution Data Preparation
340
+ idk base -c dataset/yamls/porcelain.yaml -t dataset/porcelain/jsons/porcelain.json
341
+
342
+ # Multi Resolution Data Preparation
343
+ idk multireso -c dataset/yamls/porcelain_mt.yaml -t dataset/porcelain/jsons/porcelain_mt.json
344
+ ```
345
+
346
+ The directory structure for `porcelain` dataset is:
347
+
348
+ ```shell
349
+ cd ./dataset
350
+
351
+ porcelain
352
+ ├──images/ (image files)
353
+ │ ├──0.png
354
+ │ ├──1.png
355
+ │ ├──......
356
+ ├──csvfile/ (csv files containing text-image pairs)
357
+ │ ├──image_text.csv
358
+ ├──arrows/ (arrow files containing all necessary training data)
359
+ │ ├──00000.arrow
360
+ │ ├──00001.arrow
361
+ │ ├──......
362
+ ├──jsons/ (final training data index files which read data from arrow files during training)
363
+ │ ├──porcelain.json
364
+ │ ├──porcelain_mt.json
365
+ ```
366
+
367
+ ### Full-parameter Training
368
+
369
+ To leverage DeepSpeed in training, you have the flexibility to control **single-node** / **multi-node** training by adjusting parameters such as `--hostfile` and `--master_addr`. For more details, see [link](https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node).
370
+
371
+ ```shell
372
+ # Single Resolution Data Preparation
373
+ PYTHONPATH=./ sh hydit/train.sh --index-file dataset/porcelain/jsons/porcelain.json
374
+
375
+ # Multi Resolution Data Preparation
376
+ PYTHONPATH=./ sh hydit/train.sh --index-file dataset/porcelain/jsons/porcelain.json --multireso --reso-step 64
377
+ ```
378
+
379
+ ### LoRA
380
+
381
+ We provide training and inference scripts for LoRA, detailed in the [guidances](./lora/README.md).
382
 
383
 
384
  ## 🔑 Inference
385
 
386
  ### Using Gradio
387
 
388
+ Make sure the conda environment is activated before running the following command.
389
 
390
  ```shell
391
  # By default, we start a Chinese UI.
 
400
 
401
  # Start with English UI
402
  python app/hydit_app.py --lang en
403
+
404
+ # Start a multi-turn T2I generation UI.
405
+ # If your GPU memory is less than 32GB, use '--load-4bit' to enable 4-bit quantization, which requires at least 22GB of memory.
406
+ python app/multiTurnT2I_app.py
407
  ```
408
+ Then the demo can be accessed through http://0.0.0.0:443. It should be noted that the 0.0.0.0 here needs to be X.X.X.X with your server IP.
409
+
410
+ ### Using 🤗 Diffusers
411
+
412
+ Please install PyTorch version 2.0 or higher in advance to satisfy the requirements of the specified version of the diffusers library.
413
+
414
+ Install 🤗 diffusers, ensuring that the version is at least 0.28.1:
415
+
416
+ ```shell
417
+ pip install git+https://github.com/huggingface/diffusers.git
418
+ ```
419
+ or
420
+ ```shell
421
+ pip install diffusers
422
+ ```
423
+
424
+ You can generate images with both Chinese and English prompts using the following Python script:
425
+ ```py
426
+ import torch
427
+ from diffusers import HunyuanDiTPipeline
428
+
429
+ pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-Diffusers", torch_dtype=torch.float16)
430
+ pipe.to("cuda")
431
+
432
+ # You may also use English prompt as HunyuanDiT supports both English and Chinese
433
+ # prompt = "An astronaut riding a horse"
434
+ prompt = "一个宇航员在骑马"
435
+ image = pipe(prompt).images[0]
436
+ ```
437
+ You can use our distilled model to generate images even faster:
438
+
439
+ ```py
440
+ import torch
441
+ from diffusers import HunyuanDiTPipeline
442
+
443
+ pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-Diffusers-Distilled", torch_dtype=torch.float16)
444
+ pipe.to("cuda")
445
+
446
+ # You may also use English prompt as HunyuanDiT supports both English and Chinese
447
+ # prompt = "An astronaut riding a horse"
448
+ prompt = "一个宇航员在骑马"
449
+ image = pipe(prompt, num_inference_steps=25).images[0]
450
+ ```
451
+ More details can be found in [HunyuanDiT-Diffusers-Distilled](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-Diffusers-Distilled)
452
 
453
  ### Using Command Line
454
 
455
+ We provide several commands to quick start:
456
 
457
+ ```shell
458
  # Prompt Enhancement + Text-to-Image. Torch mode
459
  python sample_t2i.py --prompt "渔舟唱晚"
460
 
 
466
 
467
  # Generate an image with other image sizes.
468
  python sample_t2i.py --prompt "渔舟唱晚" --image-size 1280 768
469
+
470
+ # Prompt Enhancement + Text-to-Image. DialogGen loads with 4-bit quantization, but it may loss performance.
471
+ python sample_t2i.py --prompt "渔舟唱晚" --load-4bit
472
+
473
  ```
474
 
475
  More example prompts can be found in [example_prompts.txt](example_prompts.txt)
 
485
  | `--seed` | 42 | The random seed for generating images |
486
  | `--infer-steps` | 100 | The number of steps for sampling |
487
  | `--negative` | - | The negative prompt for image generation |
488
+ | `--infer-mode` | torch | The inference mode (torch, fa, or trt) |
489
  | `--sampler` | ddpm | The diffusion sampler (ddpm, ddim, or dpmms) |
490
  | `--no-enhance` | False | Disable the prompt enhancement model |
491
  | `--model-root` | ckpts | The root directory of the model checkpoints |
492
  | `--load-key` | ema | Load the student model or EMA model (ema or module) |
493
+ | `--load-4bit` | Fasle | Load DialogGen model with 4bit quantization |
494
+
495
+ ### Using ComfyUI
496
+
497
+ We provide several commands to quick start:
498
+
499
+ ```shell
500
+ # Download comfyui code
501
+ git clone https://github.com/comfyanonymous/ComfyUI.git
502
+
503
+ # Install torch, torchvision, torchaudio
504
+ pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu117
505
+
506
+ # Install Comfyui essential python package
507
+ cd ComfyUI
508
+ pip install -r requirements.txt
509
+
510
+ # ComfyUI has been successfully installed!
511
+
512
+ # Download model weight as before or link the existing model folder to ComfyUI.
513
+ python -m pip install "huggingface_hub[cli]"
514
+ mkdir models/hunyuan
515
+ huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./models/hunyuan/ckpts
516
+
517
+ # Move to the ComfyUI custom_nodes folder and copy comfyui-hydit folder from HunyuanDiT Repo.
518
+ cd custom_nodes
519
+ cp -r ${HunyuanDiT}/comfyui-hydit ./
520
+ cd comfyui-hydit
521
+
522
+ # Install some essential python Package.
523
+ pip install -r requirements.txt
524
+
525
+ # Our tool has been successfully installed!
526
 
527
+ # Go to ComfyUI main folder
528
+ cd ../..
529
+ # Run the ComfyUI Lauch command
530
+ python main.py --listen --port 80
531
 
532
+ # Running ComfyUI successfully!
533
+ ```
534
+ More details can be found in [ComfyUI README](comfyui-hydit/README.md)
535
+
536
+ ## 🚀 Acceleration (for Linux)
537
+
538
+ - We provide TensorRT version of HunyuanDiT for inference acceleration (faster than flash attention).
539
+ See [Tencent-Hunyuan/TensorRT-libs](https://huggingface.co/Tencent-Hunyuan/TensorRT-libs) for more details.
540
+
541
+ - We provide Distillation version of HunyuanDiT for inference acceleration.
542
+ See [Tencent-Hunyuan/Distillation](https://huggingface.co/Tencent-Hunyuan/Distillation) for more details.
543
+
544
+ ## 🔗 BibTeX
545
  If you find [Hunyuan-DiT](https://arxiv.org/abs/2405.08748) or [DialogGen](https://arxiv.org/abs/2403.08857) useful for your research and applications, please cite using this BibTeX:
546
 
547
  ```BibTeX
 
560
  journal={arXiv preprint arXiv:2403.08857},
561
  year={2024}
562
  }
563
+ ```
564
+
565
+ ## Start History
566
+
567
+ <a href="https://star-history.com/#Tencent/HunyuanDiT&Date">
568
+ <picture>
569
+ <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date&theme=dark" />
570
+ <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date" />
571
+ <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/HunyuanDiT&type=Date" />
572
+ </picture>
573
+ </a>