Spaces:
Running
on
Zero
Running
on
Zero
<!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# DreamBooth | |
[DreamBooth](https://arxiv.org/abs/2208.12242)๋ ํ ์ฃผ์ ์ ๋ํ ์ ์ ์ด๋ฏธ์ง(3~5๊ฐ)๋ง์ผ๋ก๋ stable diffusion๊ณผ ๊ฐ์ด text-to-image ๋ชจ๋ธ์ ๊ฐ์ธํํ ์ ์๋ ๋ฐฉ๋ฒ์ ๋๋ค. ์ด๋ฅผ ํตํด ๋ชจ๋ธ์ ๋ค์ํ ์ฅ๋ฉด, ํฌ์ฆ ๋ฐ ์ฅ๋ฉด(๋ทฐ)์์ ํผ์ฌ์ฒด์ ๋ํด ๋งฅ๋ฝํ(contextualized)๋ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์์ต๋๋ค. | |
![ํ๋ก์ ํธ ๋ธ๋ก๊ทธ์์์ DreamBooth ์์](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg) | |
<small>์์์ Dreambooth ์์ <a href="https://dreambooth.github.io">project's blog.</a></small> | |
์ด ๊ฐ์ด๋๋ ๋ค์ํ GPU, Flax ์ฌ์์ ๋ํด [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) ๋ชจ๋ธ๋ก DreamBooth๋ฅผ ํ์ธํ๋ํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ์ค๋๋ค. ๋ ๊น์ด ํ๊ณ ๋ค์ด ์๋ ๋ฐฉ์์ ํ์ธํ๋ ๋ฐ ๊ด์ฌ์ด ์๋ ๊ฒฝ์ฐ, ์ด ๊ฐ์ด๋์ ์ฌ์ฉ๋ DreamBooth์ ๋ชจ๋ ํ์ต ์คํฌ๋ฆฝํธ๋ฅผ [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)์์ ์ฐพ์ ์ ์์ต๋๋ค. | |
์คํฌ๋ฆฝํธ๋ฅผ ์คํํ๊ธฐ ์ ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ํ์ต์ ํ์ํ dependencies๋ฅผ ์ค์นํด์ผ ํฉ๋๋ค. ๋ํ `main` GitHub ๋ธ๋์น์์ ๐งจ Diffusers๋ฅผ ์ค์นํ๋ ๊ฒ์ด ์ข์ต๋๋ค. | |
```bash | |
pip install git+https://github.com/huggingface/diffusers | |
pip install -U -r diffusers/examples/dreambooth/requirements.txt | |
``` | |
xFormers๋ ํ์ต์ ํ์ํ ์๊ตฌ ์ฌํญ์ ์๋์ง๋ง, ๊ฐ๋ฅํ๋ฉด [์ค์น](../optimization/xformers)ํ๋ ๊ฒ์ด ์ข์ต๋๋ค. ํ์ต ์๋๋ฅผ ๋์ด๊ณ ๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ค์ผ ์ ์๊ธฐ ๋๋ฌธ์ ๋๋ค. | |
๋ชจ๋ dependencies์ ์ค์ ํ ํ ๋ค์์ ์ฌ์ฉํ์ฌ [๐ค Accelerate](https://github.com/huggingface/accelerate/) ํ๊ฒฝ์ ๋ค์๊ณผ ๊ฐ์ด ์ด๊ธฐํํฉ๋๋ค: | |
```bash | |
accelerate config | |
``` | |
๋ณ๋ ์ค์ ์์ด ๊ธฐ๋ณธ ๐ค Accelerate ํ๊ฒฝ์ ์ค์นํ๋ ค๋ฉด ๋ค์์ ์คํํฉ๋๋ค: | |
```bash | |
accelerate config default | |
``` | |
๋๋ ํ์ฌ ํ๊ฒฝ์ด ๋ ธํธ๋ถ๊ณผ ๊ฐ์ ๋ํํ ์ ธ์ ์ง์ํ์ง ์๋ ๊ฒฝ์ฐ ๋ค์์ ์ฌ์ฉํ ์ ์์ต๋๋ค: | |
```py | |
from accelerate.utils import write_basic_config | |
write_basic_config() | |
``` | |
## ํ์ธํ๋ | |
<Tip warning={true}> | |
DreamBooth ํ์ธํ๋์ ํ์ดํผํ๋ผ๋ฏธํฐ์ ๋งค์ฐ ๋ฏผ๊ฐํ๊ณ ๊ณผ์ ํฉ๋๊ธฐ ์ฝ์ต๋๋ค. ์ ์ ํ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์ ํํ๋ ๋ฐ ๋์์ด ๋๋๋ก ๋ค์ํ ๊ถ์ฅ ์ค์ ์ด ํฌํจ๋ [์ฌ์ธต ๋ถ์](https://huggingface.co/blog/dreambooth)์ ์ดํด๋ณด๋ ๊ฒ์ด ์ข์ต๋๋ค. | |
</Tip> | |
<frameworkcontent> | |
<pt> | |
[๋ช ์ฅ์ ๊ฐ์์ง ์ด๋ฏธ์ง๋ค](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ)๋ก DreamBooth๋ฅผ ์๋ํด๋ด ์๋ค. | |
์ด๋ฅผ ๋ค์ด๋ก๋ํด ๋๋ ํฐ๋ฆฌ์ ์ ์ฅํ ๋ค์ `INSTANCE_DIR` ํ๊ฒฝ ๋ณ์๋ฅผ ํด๋น ๊ฒฝ๋ก๋ก ์ค์ ํฉ๋๋ค: | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path_to_training_images" | |
export OUTPUT_DIR="path_to_saved_model" | |
``` | |
๊ทธ๋ฐ ๋ค์, ๋ค์ ๋ช ๋ น์ ์ฌ์ฉํ์ฌ ํ์ต ์คํฌ๋ฆฝํธ๋ฅผ ์คํํ ์ ์์ต๋๋ค (์ ์ฒด ํ์ต ์คํฌ๋ฆฝํธ๋ [์ฌ๊ธฐ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)์์ ์ฐพ์ ์ ์์ต๋๋ค): | |
```bash | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--instance_prompt="a photo of sks dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--gradient_accumulation_steps=1 \ | |
--learning_rate=5e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--max_train_steps=400 | |
``` | |
</pt> | |
<jax> | |
TPU์ ์ก์ธ์คํ ์ ์๊ฑฐ๋ ๋ ๋น ๋ฅด๊ฒ ํ๋ จํ๊ณ ์ถ๋ค๋ฉด [Flax ํ์ต ์คํฌ๋ฆฝํธ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)๋ฅผ ์ฌ์ฉํด ๋ณผ ์ ์์ต๋๋ค. Flax ํ์ต ์คํฌ๋ฆฝํธ๋ gradient checkpointing ๋๋ gradient accumulation์ ์ง์ํ์ง ์์ผ๋ฏ๋ก, ๋ฉ๋ชจ๋ฆฌ๊ฐ 30GB ์ด์์ธ GPU๊ฐ ํ์ํฉ๋๋ค. | |
์คํฌ๋ฆฝํธ๋ฅผ ์คํํ๊ธฐ ์ ์ ์๊ตฌ ์ฌํญ์ด ์ค์น๋์ด ์๋์ง ํ์ธํ์ญ์์ค. | |
```bash | |
pip install -U -r requirements.txt | |
``` | |
๊ทธ๋ฌ๋ฉด ๋ค์ ๋ช ๋ น์ด๋ก ํ์ต ์คํฌ๋ฆฝํธ๋ฅผ ์คํ์ํฌ ์ ์์ต๋๋ค: | |
```bash | |
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
export INSTANCE_DIR="path-to-instance-images" | |
export OUTPUT_DIR="path-to-save-model" | |
python train_dreambooth_flax.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--instance_prompt="a photo of sks dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--learning_rate=5e-6 \ | |
--max_train_steps=400 | |
``` | |
</jax> | |
</frameworkcontent> | |
### Prior-preserving(์ฌ์ ๋ณด์กด) loss๋ฅผ ์ฌ์ฉํ ํ์ธํ๋ | |
๊ณผ์ ํฉ๊ณผ language drift๋ฅผ ๋ฐฉ์งํ๊ธฐ ์ํด ์ฌ์ ๋ณด์กด์ด ์ฌ์ฉ๋ฉ๋๋ค(๊ด์ฌ์ด ์๋ ๊ฒฝ์ฐ [๋ ผ๋ฌธ](https://arxiv.org/abs/2208.12242)์ ์ฐธ์กฐํ์ธ์). ์ฌ์ ๋ณด์กด์ ์ํด ๋์ผํ ํด๋์ค์ ๋ค๋ฅธ ์ด๋ฏธ์ง๋ฅผ ํ์ต ํ๋ก์ธ์ค์ ์ผ๋ถ๋ก ์ฌ์ฉํฉ๋๋ค. ์ข์ ์ ์ Stable Diffusion ๋ชจ๋ธ ์์ฒด๋ฅผ ์ฌ์ฉํ์ฌ ์ด๋ฌํ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ์ ์๋ค๋ ๊ฒ์ ๋๋ค! ํ์ต ์คํฌ๋ฆฝํธ๋ ์์ฑ๋ ์ด๋ฏธ์ง๋ฅผ ์ฐ๋ฆฌ๊ฐ ์ง์ ํ ๋ก์ปฌ ๊ฒฝ๋ก์ ์ ์ฅํฉ๋๋ค. | |
์ ์๋ค์ ๋ฐ๋ฅด๋ฉด ์ฌ์ ๋ณด์กด์ ์ํด `num_epochs * num_samples`๊ฐ์ ์ด๋ฏธ์ง๋ฅผ ์์ฑํ๋ ๊ฒ์ด ์ข์ต๋๋ค. 200-300๊ฐ์์ ๋๋ถ๋ถ ์ ์๋ํฉ๋๋ค. | |
<frameworkcontent> | |
<pt> | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path_to_training_images" | |
export CLASS_DIR="path_to_class_images" | |
export OUTPUT_DIR="path_to_saved_model" | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--gradient_accumulation_steps=1 \ | |
--learning_rate=5e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
</pt> | |
<jax> | |
```bash | |
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
export INSTANCE_DIR="path-to-instance-images" | |
export CLASS_DIR="path-to-class-images" | |
export OUTPUT_DIR="path-to-save-model" | |
python train_dreambooth_flax.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--learning_rate=5e-6 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
</jax> | |
</frameworkcontent> | |
## ํ ์คํธ ์ธ์ฝ๋์ and UNet๋ก ํ์ธํ๋ํ๊ธฐ | |
ํด๋น ์คํฌ๋ฆฝํธ๋ฅผ ์ฌ์ฉํ๋ฉด `unet`๊ณผ ํจ๊ป `text_encoder`๋ฅผ ํ์ธํ๋ํ ์ ์์ต๋๋ค. ์คํ์์(์์ธํ ๋ด์ฉ์ [๐งจ Diffusers๋ฅผ ์ฌ์ฉํด DreamBooth๋ก Stable Diffusion ํ์ตํ๊ธฐ](https://huggingface.co/blog/dreambooth) ๊ฒ์๋ฌผ์ ํ์ธํ์ธ์), ํนํ ์ผ๊ตด ์ด๋ฏธ์ง๋ฅผ ์์ฑํ ๋ ํจ์ฌ ๋ ๋์ ๊ฒฐ๊ณผ๋ฅผ ์ป์ ์ ์์ต๋๋ค. | |
<Tip warning={true}> | |
ํ ์คํธ ์ธ์ฝ๋๋ฅผ ํ์ต์ํค๋ ค๋ฉด ์ถ๊ฐ ๋ฉ๋ชจ๋ฆฌ๊ฐ ํ์ํด 16GB GPU๋ก๋ ๋์ํ์ง ์์ต๋๋ค. ์ด ์ต์ ์ ์ฌ์ฉํ๋ ค๋ฉด ์ต์ 24GB VRAM์ด ํ์ํฉ๋๋ค. | |
</Tip> | |
`--train_text_encoder` ์ธ์๋ฅผ ํ์ต ์คํฌ๋ฆฝํธ์ ์ ๋ฌํ์ฌ `text_encoder` ๋ฐ `unet`์ ํ์ธํ๋ํ ์ ์์ต๋๋ค: | |
<frameworkcontent> | |
<pt> | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path_to_training_images" | |
export CLASS_DIR="path_to_class_images" | |
export OUTPUT_DIR="path_to_saved_model" | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--train_text_encoder \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--use_8bit_adam | |
--gradient_checkpointing \ | |
--learning_rate=2e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
</pt> | |
<jax> | |
```bash | |
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax" | |
export INSTANCE_DIR="path-to-instance-images" | |
export CLASS_DIR="path-to-class-images" | |
export OUTPUT_DIR="path-to-save-model" | |
python train_dreambooth_flax.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--train_text_encoder \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--learning_rate=2e-6 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
</jax> | |
</frameworkcontent> | |
## LoRA๋ก ํ์ธํ๋ํ๊ธฐ | |
DreamBooth์์ ๋๊ท๋ชจ ๋ชจ๋ธ์ ํ์ต์ ๊ฐ์ํํ๊ธฐ ์ํ ํ์ธํ๋ ๊ธฐ์ ์ธ LoRA(Low-Rank Adaptation of Large Language Models)๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค. ์์ธํ ๋ด์ฉ์ [LoRA ํ์ต](training/lora#dreambooth) ๊ฐ์ด๋๋ฅผ ์ฐธ์กฐํ์ธ์. | |
### ํ์ต ์ค ์ฒดํฌํฌ์ธํธ ์ ์ฅํ๊ธฐ | |
Dreambooth๋ก ํ๋ จํ๋ ๋์ ๊ณผ์ ํฉํ๊ธฐ ์ฌ์ฐ๋ฏ๋ก, ๋๋๋ก ํ์ต ์ค์ ์ ๊ธฐ์ ์ธ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํ๋ ๊ฒ์ด ์ ์ฉํฉ๋๋ค. ์ค๊ฐ ์ฒดํฌํฌ์ธํธ ์ค ํ๋๊ฐ ์ต์ข ๋ชจ๋ธ๋ณด๋ค ๋ ์ ์๋ํ ์ ์์ต๋๋ค! ์ฒดํฌํฌ์ธํธ ์ ์ฅ ๊ธฐ๋ฅ์ ํ์ฑํํ๋ ค๋ฉด ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ ๋ฌํด์ผ ํฉ๋๋ค: | |
```bash | |
--checkpointing_steps=500 | |
``` | |
์ด๋ ๊ฒ ํ๋ฉด `output_dir`์ ํ์ ํด๋์ ์ ์ฒด ํ์ต ์ํ๊ฐ ์ ์ฅ๋ฉ๋๋ค. ํ์ ํด๋ ์ด๋ฆ์ ์ ๋์ฌ `checkpoint-`๋ก ์์ํ๊ณ ์ง๊ธ๊น์ง ์ํ๋ step ์์ ๋๋ค. ์์๋ก `checkpoint-1500`์ 1500 ํ์ต step ํ์ ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์ ๋๋ค. | |
#### ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์์ ํ๋ จ ์ฌ๊ฐํ๊ธฐ | |
์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์์ ํ๋ จ์ ์ฌ๊ฐํ๋ ค๋ฉด, `--resume_from_checkpoint` ์ธ์๋ฅผ ์ ๋ฌํ ๋ค์ ์ฌ์ฉํ ์ฒดํฌํฌ์ธํธ์ ์ด๋ฆ์ ์ง์ ํ๋ฉด ๋ฉ๋๋ค. ํน์ ๋ฌธ์์ด `"latest"`๋ฅผ ์ฌ์ฉํ์ฌ ์ ์ฅ๋ ๋ง์ง๋ง ์ฒดํฌํฌ์ธํธ(์ฆ, step ์๊ฐ ๊ฐ์ฅ ๋ง์ ์ฒดํฌํฌ์ธํธ)์์ ์ฌ๊ฐํ ์๋ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด ๋ค์์ 1500 step ํ์ ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ์์๋ถํฐ ํ์ต์ ์ฌ๊ฐํฉ๋๋ค: | |
```bash | |
--resume_from_checkpoint="checkpoint-1500" | |
``` | |
์ํ๋ ๊ฒฝ์ฐ ์ผ๋ถ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์กฐ์ ํ ์ ์์ต๋๋ค. | |
#### ์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ฌ์ฉํ์ฌ ์ถ๋ก ์ํํ๊ธฐ | |
์ ์ฅ๋ ์ฒดํฌํฌ์ธํธ๋ ํ๋ จ ์ฌ๊ฐ์ ์ ํฉํ ํ์์ผ๋ก ์ ์ฅ๋ฉ๋๋ค. ์ฌ๊ธฐ์๋ ๋ชจ๋ธ ๊ฐ์ค์น๋ฟ๋ง ์๋๋ผ ์ตํฐ๋ง์ด์ , ๋ฐ์ดํฐ ๋ก๋ ๋ฐ ํ์ต๋ฅ ์ ์ํ๋ ํฌํจ๋ฉ๋๋ค. | |
**`"accelerate>=0.16.0"`**์ด ์ค์น๋ ๊ฒฝ์ฐ ๋ค์ ์ฝ๋๋ฅผ ์ฌ์ฉํ์ฌ ์ค๊ฐ ์ฒดํฌํฌ์ธํธ์์ ์ถ๋ก ์ ์คํํฉ๋๋ค. | |
```python | |
from diffusers import DiffusionPipeline, UNet2DConditionModel | |
from transformers import CLIPTextModel | |
import torch | |
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค. | |
model_id = "CompVis/stable-diffusion-v1-4" | |
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet") | |
# `args.train_text_encoder`๋ก ํ์ตํ ๊ฒฝ์ฐ๋ฉด ํ ์คํธ ์ธ์ฝ๋๋ฅผ ๊ผญ ๋ถ๋ฌ์ค์ธ์ | |
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder") | |
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16) | |
pipeline.to("cuda") | |
# ์ถ๋ก ์ ์ํํ๊ฑฐ๋ ์ ์ฅํ๊ฑฐ๋, ํ๋ธ์ ํธ์ํฉ๋๋ค. | |
pipeline.save_pretrained("dreambooth-pipeline") | |
``` | |
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first: | |
```python | |
from accelerate import Accelerator | |
from diffusers import DiffusionPipeline | |
# ํ์ต์ ์ฌ์ฉ๋ ๊ฒ๊ณผ ๋์ผํ ์ธ์(model, revision)๋ก ํ์ดํ๋ผ์ธ์ ๋ถ๋ฌ์ต๋๋ค. | |
model_id = "CompVis/stable-diffusion-v1-4" | |
pipeline = DiffusionPipeline.from_pretrained(model_id) | |
accelerator = Accelerator() | |
# ์ด๊ธฐ ํ์ต์ `--train_text_encoder`๊ฐ ์ฌ์ฉ๋ ๊ฒฝ์ฐ text_encoder๋ฅผ ์ฌ์ฉํฉ๋๋ค. | |
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder) | |
# ์ฒดํฌํฌ์ธํธ ๊ฒฝ๋ก๋ก๋ถํฐ ์ํ๋ฅผ ๋ณต์ํฉ๋๋ค. ์ฌ๊ธฐ์๋ ์ ๋ ๊ฒฝ๋ก๋ฅผ ์ฌ์ฉํด์ผ ํฉ๋๋ค. | |
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100") | |
# unwrapped ๋ชจ๋ธ๋ก ํ์ดํ๋ผ์ธ์ ๋ค์ ๋น๋ํฉ๋๋ค.(.unet and .text_encoder๋ก์ ํ ๋น๋ ์๋ํด์ผ ํฉ๋๋ค) | |
pipeline = DiffusionPipeline.from_pretrained( | |
model_id, | |
unet=accelerator.unwrap_model(unet), | |
text_encoder=accelerator.unwrap_model(text_encoder), | |
) | |
# ์ถ๋ก ์ ์ํํ๊ฑฐ๋ ์ ์ฅํ๊ฑฐ๋, ํ๋ธ์ ํธ์ํฉ๋๋ค. | |
pipeline.save_pretrained("dreambooth-pipeline") | |
``` | |
## ๊ฐ GPU ์ฉ๋์์์ ์ต์ ํ | |
ํ๋์จ์ด์ ๋ฐ๋ผ 16GB์์ 8GB๊น์ง GPU์์ DreamBooth๋ฅผ ์ต์ ํํ๋ ๋ช ๊ฐ์ง ๋ฐฉ๋ฒ์ด ์์ต๋๋ค! | |
### xFormers | |
[xFormers](https://github.com/facebookresearch/xformers)๋ Transformers๋ฅผ ์ต์ ํํ๊ธฐ ์ํ toolbox์ด๋ฉฐ, ๐งจ Diffusers์์ ์ฌ์ฉ๋๋[memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) ๋ฉ์ปค๋์ฆ์ ํฌํจํ๊ณ ์์ต๋๋ค. [xFormers๋ฅผ ์ค์น](./optimization/xformers)ํ ๋ค์ ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ถ๊ฐํฉ๋๋ค: | |
```bash | |
--enable_xformers_memory_efficient_attention | |
``` | |
xFormers๋ Flax์์ ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
### ๊ทธ๋๋์ธํธ ์์์ผ๋ก ์ค์ | |
๋ฉ๋ชจ๋ฆฌ ์ฌ์ฉ๋์ ์ค์ผ ์ ์๋ ๋ ๋ค๋ฅธ ๋ฐฉ๋ฒ์ [๊ธฐ์ธ๊ธฐ ์ค์ ](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)์ 0 ๋์ `None`์ผ๋ก ํ๋ ๊ฒ์ ๋๋ค. ๊ทธ๋ฌ๋ ์ด๋ก ์ธํด ํน์ ๋์์ด ๋ณ๊ฒฝ๋ ์ ์์ผ๋ฏ๋ก ๋ฌธ์ ๊ฐ ๋ฐ์ํ๋ฉด ์ด ์ธ์๋ฅผ ์ ๊ฑฐํด ๋ณด์ญ์์ค. ํ์ต ์คํฌ๋ฆฝํธ์ ๋ค์ ์ธ์๋ฅผ ์ถ๊ฐํ์ฌ ๊ทธ๋๋์ธํธ๋ฅผ `None`์ผ๋ก ์ค์ ํฉ๋๋ค. | |
```bash | |
--set_grads_to_none | |
``` | |
### 16GB GPU | |
Gradient checkpointing๊ณผ [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)์ 8๋นํธ ์ตํฐ๋ง์ด์ ์ ๋์์ผ๋ก, 16GB GPU์์ dreambooth๋ฅผ ํ๋ จํ ์ ์์ต๋๋ค. bitsandbytes๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์: | |
```bash | |
pip install bitsandbytes | |
``` | |
๊ทธ ๋ค์, ํ์ต ์คํฌ๋ฆฝํธ์ `--use_8bit_adam` ์ต์ ์ ๋ช ์ํฉ๋๋ค: | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path_to_training_images" | |
export CLASS_DIR="path_to_class_images" | |
export OUTPUT_DIR="path_to_saved_model" | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--gradient_accumulation_steps=2 --gradient_checkpointing \ | |
--use_8bit_adam \ | |
--learning_rate=5e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
### 12GB GPU | |
12GB GPU์์ DreamBooth๋ฅผ ์คํํ๋ ค๋ฉด gradient checkpointing, 8๋นํธ ์ตํฐ๋ง์ด์ , xFormers๋ฅผ ํ์ฑํํ๊ณ ๊ทธ๋๋์ธํธ๋ฅผ `None`์ผ๋ก ์ค์ ํด์ผ ํฉ๋๋ค. | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path-to-instance-images" | |
export CLASS_DIR="path-to-class-images" | |
export OUTPUT_DIR="path-to-save-model" | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--gradient_accumulation_steps=1 --gradient_checkpointing \ | |
--use_8bit_adam \ | |
--enable_xformers_memory_efficient_attention \ | |
--set_grads_to_none \ | |
--learning_rate=2e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 | |
``` | |
### 8GB GPU์์ ํ์ตํ๊ธฐ | |
8GB GPU์ ๋ํด์๋ [DeepSpeed](https://www.deepspeed.ai/)๋ฅผ ์ฌ์ฉํด ์ผ๋ถ ํ ์๋ฅผ VRAM์์ CPU ๋๋ NVME๋ก ์คํ๋ก๋ํ์ฌ ๋ ์ ์ GPU ๋ฉ๋ชจ๋ฆฌ๋ก ํ์ตํ ์๋ ์์ต๋๋ค. | |
๐ค Accelerate ํ๊ฒฝ์ ๊ตฌ์ฑํ๋ ค๋ฉด ๋ค์ ๋ช ๋ น์ ์คํํ์ธ์: | |
```bash | |
accelerate config | |
``` | |
ํ๊ฒฝ ๊ตฌ์ฑ ์ค์ DeepSpeed๋ฅผ ์ฌ์ฉํ ๊ฒ์ ํ์ธํ์ธ์. | |
๊ทธ๋ฌ๋ฉด DeepSpeed stage 2, fp16 ํผํฉ ์ ๋ฐ๋๋ฅผ ๊ฒฐํฉํ๊ณ ๋ชจ๋ธ ๋งค๊ฐ๋ณ์์ ์ตํฐ๋ง์ด์ ์ํ๋ฅผ ๋ชจ๋ CPU๋ก ์คํ๋ก๋ํ๋ฉด 8GB VRAM ๋ฏธ๋ง์์ ํ์ตํ ์ ์์ต๋๋ค. | |
๋จ์ ์ ๋ ๋ง์ ์์คํ RAM(์ฝ 25GB)์ด ํ์ํ๋ค๋ ๊ฒ์ ๋๋ค. ์ถ๊ฐ ๊ตฌ์ฑ ์ต์ ์ [DeepSpeed ๋ฌธ์](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)๋ฅผ ์ฐธ์กฐํ์ธ์. | |
๋ํ ๊ธฐ๋ณธ Adam ์ตํฐ๋ง์ด์ ๋ฅผ DeepSpeed์ ์ต์ ํ๋ Adam ๋ฒ์ ์ผ๋ก ๋ณ๊ฒฝํด์ผ ํฉ๋๋ค. | |
์ด๋ ์๋นํ ์๋ ํฅ์์ ์ํ Adam์ธ [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu)์ ๋๋ค. | |
`DeepSpeedCPUAdam`์ ํ์ฑํํ๋ ค๋ฉด ์์คํ ์ CUDA toolchain ๋ฒ์ ์ด PyTorch์ ํจ๊ป ์ค์น๋ ๊ฒ๊ณผ ๋์ผํด์ผ ํฉ๋๋ค. | |
8๋นํธ ์ตํฐ๋ง์ด์ ๋ ํ์ฌ DeepSpeed์ ํธํ๋์ง ์๋ ๊ฒ ๊ฐ์ต๋๋ค. | |
๋ค์ ๋ช ๋ น์ผ๋ก ํ์ต์ ์์ํฉ๋๋ค: | |
```bash | |
export MODEL_NAME="CompVis/stable-diffusion-v1-4" | |
export INSTANCE_DIR="path_to_training_images" | |
export CLASS_DIR="path_to_class_images" | |
export OUTPUT_DIR="path_to_saved_model" | |
accelerate launch train_dreambooth.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--instance_data_dir=$INSTANCE_DIR \ | |
--class_data_dir=$CLASS_DIR \ | |
--output_dir=$OUTPUT_DIR \ | |
--with_prior_preservation --prior_loss_weight=1.0 \ | |
--instance_prompt="a photo of sks dog" \ | |
--class_prompt="a photo of dog" \ | |
--resolution=512 \ | |
--train_batch_size=1 \ | |
--sample_batch_size=1 \ | |
--gradient_accumulation_steps=1 --gradient_checkpointing \ | |
--learning_rate=5e-6 \ | |
--lr_scheduler="constant" \ | |
--lr_warmup_steps=0 \ | |
--num_class_images=200 \ | |
--max_train_steps=800 \ | |
--mixed_precision=fp16 | |
``` | |
## ์ถ๋ก | |
๋ชจ๋ธ์ ํ์ตํ ํ์๋, ๋ชจ๋ธ์ด ์ ์ฅ๋ ๊ฒฝ๋ก๋ฅผ ์ง์ ํด [`StableDiffusionPipeline`]๋ก ์ถ๋ก ์ ์ํํ ์ ์์ต๋๋ค. ํ๋กฌํํธ์ ํ์ต์ ์ฌ์ฉ๋ ํน์ `์๋ณ์`(์ด์ ์์์ `sks`)๊ฐ ํฌํจ๋์ด ์๋์ง ํ์ธํ์ธ์. | |
**`"accelerate>=0.16.0"`**์ด ์ค์น๋์ด ์๋ ๊ฒฝ์ฐ ๋ค์ ์ฝ๋๋ฅผ ์ฌ์ฉํ์ฌ ์ค๊ฐ ์ฒดํฌํฌ์ธํธ์์ ์ถ๋ก ์ ์คํํ ์ ์์ต๋๋ค: | |
```python | |
from diffusers import StableDiffusionPipeline | |
import torch | |
model_id = "path_to_saved_model" | |
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") | |
prompt = "A photo of sks dog in a bucket" | |
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0] | |
image.save("dog-bucket.png") | |
``` | |
[์ ์ฅ๋ ํ์ต ์ฒดํฌํฌ์ธํธ](#inference-from-a-saved-checkpoint)์์๋ ์ถ๋ก ์ ์คํํ ์๋ ์์ต๋๋ค. | |