license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
language:
- en
HunyuanDiT LoRA
Language: English
Instructions
The dependencies and installation are basically the same as the original model.
Then download the model using the following commands:
cd HunyuanDiT
# Use the huggingface-cli tool to download the model.
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA ./porcelain --local-dir ./ckpts/t2i/lora
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA ./jade --local-dir ./ckpts/t2i/lora
Training
We provide three types of weights for fine-tuning HY-DiT LoRA, ema
, module
and distill
, and you can choose according to the actual effect. By default, we use ema
weights.
Here is an example, we load the ema
weights into the main model and perform LoRA fine-tuning through the --ema-to-module
parameter.
If you want to load the module
weights into the main model, just remove the --ema-to-module
parameter.
If multiple resolution are used, you need to add the --multireso
and --reso-step 64
parameter.
model='DiT-g/2' # model type
task_flag="lora_jade_ema_rank64" # task flag
resume=./ckpts/t2i/model/ # resume checkpoint
index_file=dataset/index_v2_json/jade.json # index file
batch_size=1 # training batch size
grad_accu_steps=2 # gradient accumulation steps
rank=64 # rank of lora
max_training_steps=2000 # max training steps
lr=0.0001 # learning rate
PYTHONPATH=./ deepspeed hydit/train_large_deepspeed.py \
--task-flag ${task_flag} \
--model ${model} \
--training_parts lora \
--rank ${rank} \
--resume-split \
--resume ${resume} \
--ema-to-module \
--lr ${lr} \
--noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
--predict-type v_prediction \
--uncond-p 0.44 \
--uncond-p-t5 0.44 \
--index-file ${index_file} \
--random-crop \
--random-flip \
--batch-size ${batch_size} \
--image-size 1024 \
--global-seed 999 \
--grad-accu-steps ${grad_accu_steps} \
--warmup-num-steps 0 \
--use-flash-attn \
--use-fp16 \
--ema-dtype fp32 \
--results-dir ./log_EXP \
--ckpt-every 100 \
--max-training-steps ${max_training_steps}\
--ckpt-latest-every 2000 \
--log-every 10 \
--deepspeed \
--deepspeed-optimizer \
--use-zero-stage 2 \
--qk-norm \
--rope-img base512 \
--rope-real \
"$@"
Inference
Using Gradio
Make sure you have activated the conda environment before running the following command.
⚠️ Important Reminder:
We recommend not using prompt enhance, as it may lead to the disappearance of style words.
# porcelain style
# By default, we start a Chinese UI.
python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# Using Flash Attention for acceleration.
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# You can disable the enhancement model if the GPU memory is insufficient.
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# Start with English UI
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# jade style
# By default, we start a Chinese UI.
python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# Using Flash Attention for acceleration.
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# You can disable the enhancement model if the GPU memory is insufficient.
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# Start with English UI
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
Using Command Line
We provide several commands to quick start:
# porcelain style
# Prompt Enhancement + Text-to-Image. Torch mode
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# Only Text-to-Image. Torch mode
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# Only Text-to-Image. Flash Attention mode
python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# Generate an image with other image sizes.
python sample_t2i.py --prompt "玉石绘画风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade
# jade style
# Prompt Enhancement + Text-to-Image. Torch mode
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# Only Text-to-Image. Torch mode
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# Only Text-to-Image. Flash Attention mode
python sample_t2i.py --infer-mode fa --prompt "青花瓷风格,一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
# Generate an image with other image sizes.
python sample_t2i.py --prompt "青花瓷风格,一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
More example prompts can be found in example_prompts.txt