BleachNick's picture
upload required packages
87d40d2
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# DreamBooth
[DreamBooth](https://arxiv.org/abs/2208.12242)๋Š” ํ•œ ์ฃผ์ œ์— ๋Œ€ํ•œ ์ ์€ ์ด๋ฏธ์ง€(3~5๊ฐœ)๋งŒ์œผ๋กœ๋„ stable diffusion๊ณผ ๊ฐ™์ด text-to-image ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์žฅ๋ฉด, ํฌ์ฆˆ ๋ฐ ์žฅ๋ฉด(๋ทฐ)์—์„œ ํ”ผ์‚ฌ์ฒด์— ๋Œ€ํ•ด ๋งฅ๋ฝํ™”(contextualized)๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
![ํ”„๋กœ์ ํŠธ ๋ธ”๋กœ๊ทธ์—์„œ์˜ DreamBooth ์˜ˆ์‹œ](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg)
<small>์—์„œ์˜ Dreambooth ์˜ˆ์‹œ <a href="https://dreambooth.github.io">project's blog.</a></small>
์ด ๊ฐ€์ด๋“œ๋Š” ๋‹ค์–‘ํ•œ GPU, Flax ์‚ฌ์–‘์— ๋Œ€ํ•ด [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4) ๋ชจ๋ธ๋กœ DreamBooth๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋” ๊นŠ์ด ํŒŒ๊ณ ๋“ค์–ด ์ž‘๋™ ๋ฐฉ์‹์„ ํ™•์ธํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด ๊ฐ€์ด๋“œ์— ์‚ฌ์šฉ๋œ DreamBooth์˜ ๋ชจ๋“  ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ํ•™์Šต์— ํ•„์š”ํ•œ dependencies๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ `main` GitHub ๋ธŒ๋žœ์น˜์—์„œ ๐Ÿงจ Diffusers๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
```bash
pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt
```
xFormers๋Š” ํ•™์Šต์— ํ•„์š”ํ•œ ์š”๊ตฌ ์‚ฌํ•ญ์€ ์•„๋‹ˆ์ง€๋งŒ, ๊ฐ€๋Šฅํ•˜๋ฉด [์„ค์น˜](../optimization/xformers)ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์†๋„๋ฅผ ๋†’์ด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๋ชจ๋“  dependencies์„ ์„ค์ •ํ•œ ํ›„ ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ [๐Ÿค— Accelerate](https://github.com/huggingface/accelerate/) ํ™˜๊ฒฝ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค:
```bash
accelerate config
```
๋ณ„๋„ ์„ค์ • ์—†์ด ๊ธฐ๋ณธ ๐Ÿค— Accelerate ํ™˜๊ฒฝ์„ ์„ค์น˜ํ•˜๋ ค๋ฉด ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
```bash
accelerate config default
```
๋˜๋Š” ํ˜„์žฌ ํ™˜๊ฒฝ์ด ๋…ธํŠธ๋ถ๊ณผ ๊ฐ™์€ ๋Œ€ํ™”ํ˜• ์…ธ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```py
from accelerate.utils import write_basic_config
write_basic_config()
```
## ํŒŒ์ธํŠœ๋‹
<Tip warning={true}>
DreamBooth ํŒŒ์ธํŠœ๋‹์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋งค์šฐ ๋ฏผ๊ฐํ•˜๊ณ  ๊ณผ์ ํ•ฉ๋˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋„๋ก ๋‹ค์–‘ํ•œ ๊ถŒ์žฅ ์„ค์ •์ด ํฌํ•จ๋œ [์‹ฌ์ธต ๋ถ„์„](https://huggingface.co/blog/dreambooth)์„ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
</Tip>
<frameworkcontent>
<pt>
[๋ช‡ ์žฅ์˜ ๊ฐ•์•„์ง€ ์ด๋ฏธ์ง€๋“ค](https://drive.google.com/drive/folders/1BO_dyz-p65qhBRRMRA4TbZ8qW4rB99JZ)๋กœ DreamBooth๋ฅผ ์‹œ๋„ํ•ด๋ด…์‹œ๋‹ค.
์ด๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ด ๋””๋ ‰ํ„ฐ๋ฆฌ์— ์ €์žฅํ•œ ๋‹ค์Œ `INSTANCE_DIR` ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ํ•ด๋‹น ๊ฒฝ๋กœ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export OUTPUT_DIR="path_to_saved_model"
```
๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋‹ค์Œ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์ „์ฒด ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” [์—ฌ๊ธฐ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค):
```bash
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=400
```
</pt>
<jax>
TPU์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜ ๋” ๋น ๋ฅด๊ฒŒ ํ›ˆ๋ จํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด [Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” gradient checkpointing ๋˜๋Š” gradient accumulation์„ ์ง€์›ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 30GB ์ด์ƒ์ธ GPU๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ์š”๊ตฌ ์‚ฌํ•ญ์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
```bash
pip install -U -r requirements.txt
```
๊ทธ๋Ÿฌ๋ฉด ๋‹ค์Œ ๋ช…๋ น์–ด๋กœ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--max_train_steps=400
```
</jax>
</frameworkcontent>
### Prior-preserving(์‚ฌ์ „ ๋ณด์กด) loss๋ฅผ ์‚ฌ์šฉํ•œ ํŒŒ์ธํŠœ๋‹
๊ณผ์ ํ•ฉ๊ณผ language drift๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „ ๋ณด์กด์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค(๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ [๋…ผ๋ฌธ](https://arxiv.org/abs/2208.12242)์„ ์ฐธ์กฐํ•˜์„ธ์š”). ์‚ฌ์ „ ๋ณด์กด์„ ์œ„ํ•ด ๋™์ผํ•œ ํด๋ž˜์Šค์˜ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šต ํ”„๋กœ์„ธ์Šค์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ข‹์€ ์ ์€ Stable Diffusion ๋ชจ๋ธ ์ž์ฒด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ์šฐ๋ฆฌ๊ฐ€ ์ง€์ •ํ•œ ๋กœ์ปฌ ๊ฒฝ๋กœ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
์ €์ž๋“ค์— ๋”ฐ๋ฅด๋ฉด ์‚ฌ์ „ ๋ณด์กด์„ ์œ„ํ•ด `num_epochs * num_samples`๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. 200-300๊ฐœ์—์„œ ๋Œ€๋ถ€๋ถ„ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
## ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ and UNet๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ
ํ•ด๋‹น ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด `unet`๊ณผ ํ•จ๊ป˜ `text_encoder`๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜์—์„œ(์ž์„ธํ•œ ๋‚ด์šฉ์€ [๐Ÿงจ Diffusers๋ฅผ ์‚ฌ์šฉํ•ด DreamBooth๋กœ Stable Diffusion ํ•™์Šตํ•˜๊ธฐ](https://huggingface.co/blog/dreambooth) ๊ฒŒ์‹œ๋ฌผ์„ ํ™•์ธํ•˜์„ธ์š”), ํŠนํžˆ ์–ผ๊ตด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ํ›จ์”ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
<Tip warning={true}>
ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ํ•™์Šต์‹œํ‚ค๋ ค๋ฉด ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•ด 16GB GPU๋กœ๋Š” ๋™์ž‘ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์ตœ์†Œ 24GB VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
</Tip>
`--train_text_encoder` ์ธ์ˆ˜๋ฅผ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ์ „๋‹ฌํ•˜์—ฌ `text_encoder` ๋ฐ `unet`์„ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
<frameworkcontent>
<pt>
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--use_8bit_adam
--gradient_checkpointing \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
</pt>
<jax>
```bash
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=2e-6 \
--num_class_images=200 \
--max_train_steps=800
```
</jax>
</frameworkcontent>
## LoRA๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ
DreamBooth์—์„œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์˜ ํ•™์Šต์„ ๊ฐ€์†ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํŒŒ์ธํŠœ๋‹ ๊ธฐ์ˆ ์ธ LoRA(Low-Rank Adaptation of Large Language Models)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ [LoRA ํ•™์Šต](training/lora#dreambooth) ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
### ํ•™์Šต ์ค‘ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ธฐ
Dreambooth๋กœ ํ›ˆ๋ จํ•˜๋Š” ๋™์•ˆ ๊ณผ์ ํ•ฉํ•˜๊ธฐ ์‰ฌ์šฐ๋ฏ€๋กœ, ๋•Œ๋•Œ๋กœ ํ•™์Šต ์ค‘์— ์ •๊ธฐ์ ์ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ตœ์ข… ๋ชจ๋ธ๋ณด๋‹ค ๋” ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ๊ธฐ๋Šฅ์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```bash
--checkpointing_steps=500
```
์ด๋ ‡๊ฒŒ ํ•˜๋ฉด `output_dir`์˜ ํ•˜์œ„ ํด๋”์— ์ „์ฒด ํ•™์Šต ์ƒํƒœ๊ฐ€ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ ํด๋” ์ด๋ฆ„์€ ์ ‘๋‘์‚ฌ `checkpoint-`๋กœ ์‹œ์ž‘ํ•˜๊ณ  ์ง€๊ธˆ๊นŒ์ง€ ์ˆ˜ํ–‰๋œ step ์ˆ˜์ž…๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋กœ `checkpoint-1500`์€ 1500 ํ•™์Šต step ํ›„์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.
#### ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ›ˆ๋ จ ์žฌ๊ฐœํ•˜๊ธฐ
์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ›ˆ๋ จ์„ ์žฌ๊ฐœํ•˜๋ ค๋ฉด, `--resume_from_checkpoint` ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•œ ๋‹ค์Œ ์‚ฌ์šฉํ•  ์ฒดํฌํฌ์ธํŠธ์˜ ์ด๋ฆ„์„ ์ง€์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํŠน์ˆ˜ ๋ฌธ์ž์—ด `"latest"`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅ๋œ ๋งˆ์ง€๋ง‰ ์ฒดํฌํฌ์ธํŠธ(์ฆ‰, step ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ฒดํฌํฌ์ธํŠธ)์—์„œ ์žฌ๊ฐœํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ์€ 1500 step ํ›„์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ๋ถ€ํ„ฐ ํ•™์Šต์„ ์žฌ๊ฐœํ•ฉ๋‹ˆ๋‹ค:
```bash
--resume_from_checkpoint="checkpoint-1500"
```
์›ํ•˜๋Š” ๊ฒฝ์šฐ ์ผ๋ถ€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
#### ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์ˆ˜ํ–‰ํ•˜๊ธฐ
์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋Š” ํ›ˆ๋ จ ์žฌ๊ฐœ์— ์ ํ•ฉํ•œ ํ˜•์‹์œผ๋กœ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜ตํ‹ฐ๋งˆ์ด์ €, ๋ฐ์ดํ„ฐ ๋กœ๋” ๋ฐ ํ•™์Šต๋ฅ ์˜ ์ƒํƒœ๋„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.
**`"accelerate>=0.16.0"`**์ด ์„ค์น˜๋œ ๊ฒฝ์šฐ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
```python
from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")
# `args.train_text_encoder`๋กœ ํ•™์Šตํ•œ ๊ฒฝ์šฐ๋ฉด ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ๊ผญ ๋ถˆ๋Ÿฌ์˜ค์„ธ์š”
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder")
pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16)
pipeline.to("cuda")
# ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ์ €์žฅํ•˜๊ฑฐ๋‚˜, ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค.
pipeline.save_pretrained("dreambooth-pipeline")
```
If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first:
```python
from accelerate import Accelerator
from diffusers import DiffusionPipeline
# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"
pipeline = DiffusionPipeline.from_pretrained(model_id)
accelerator = Accelerator()
# ์ดˆ๊ธฐ ํ•™์Šต์— `--train_text_encoder`๊ฐ€ ์‚ฌ์šฉ๋œ ๊ฒฝ์šฐ text_encoder๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder)
# ์ฒดํฌํฌ์ธํŠธ ๊ฒฝ๋กœ๋กœ๋ถ€ํ„ฐ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์ ˆ๋Œ€ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100")
# unwrapped ๋ชจ๋ธ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋‹ค์‹œ ๋นŒ๋“œํ•ฉ๋‹ˆ๋‹ค.(.unet and .text_encoder๋กœ์˜ ํ• ๋‹น๋„ ์ž‘๋™ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค)
pipeline = DiffusionPipeline.from_pretrained(
model_id,
unet=accelerator.unwrap_model(unet),
text_encoder=accelerator.unwrap_model(text_encoder),
)
# ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ์ €์žฅํ•˜๊ฑฐ๋‚˜, ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค.
pipeline.save_pretrained("dreambooth-pipeline")
```
## ๊ฐ GPU ์šฉ๋Ÿ‰์—์„œ์˜ ์ตœ์ ํ™”
ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ 16GB์—์„œ 8GB๊นŒ์ง€ GPU์—์„œ DreamBooth๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค!
### xFormers
[xFormers](https://github.com/facebookresearch/xformers)๋Š” Transformers๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•œ toolbox์ด๋ฉฐ, ๐Ÿงจ Diffusers์—์„œ ์‚ฌ์šฉ๋˜๋Š”[memory-efficient attention](https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. [xFormers๋ฅผ ์„ค์น˜](./optimization/xformers)ํ•œ ๋‹ค์Œ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:
```bash
--enable_xformers_memory_efficient_attention
```
xFormers๋Š” Flax์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
### ๊ทธ๋ž˜๋””์–ธํŠธ ์—†์Œ์œผ๋กœ ์„ค์ •
๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ [๊ธฐ์šธ๊ธฐ ์„ค์ •](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)์„ 0 ๋Œ€์‹  `None`์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋กœ ์ธํ•ด ํŠน์ • ๋™์ž‘์ด ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ด ์ธ์ˆ˜๋ฅผ ์ œ๊ฑฐํ•ด ๋ณด์‹ญ์‹œ์˜ค. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ `None`์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
```bash
--set_grads_to_none
```
### 16GB GPU
Gradient checkpointing๊ณผ [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)์˜ 8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ๋„์›€์œผ๋กœ, 16GB GPU์—์„œ dreambooth๋ฅผ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. bitsandbytes๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:
```bash
pip install bitsandbytes
```
๊ทธ ๋‹ค์Œ, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— `--use_8bit_adam` ์˜ต์…˜์„ ๋ช…์‹œํ•ฉ๋‹ˆ๋‹ค:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=2 --gradient_checkpointing \
--use_8bit_adam \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
### 12GB GPU
12GB GPU์—์„œ DreamBooth๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด gradient checkpointing, 8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €, xFormers๋ฅผ ํ™œ์„ฑํ™”ํ•˜๊ณ  ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ `None`์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800
```
### 8GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ
8GB GPU์— ๋Œ€ํ•ด์„œ๋Š” [DeepSpeed](https://www.deepspeed.ai/)๋ฅผ ์‚ฌ์šฉํ•ด ์ผ๋ถ€ ํ…์„œ๋ฅผ VRAM์—์„œ CPU ๋˜๋Š” NVME๋กœ ์˜คํ”„๋กœ๋“œํ•˜์—ฌ ๋” ์ ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๋กœ ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿค— Accelerate ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”:
```bash
accelerate config
```
ํ™˜๊ฒฝ ๊ตฌ์„ฑ ์ค‘์— DeepSpeed๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ํ™•์ธํ•˜์„ธ์š”.
๊ทธ๋Ÿฌ๋ฉด DeepSpeed stage 2, fp16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„๋ฅผ ๊ฒฐํ•ฉํ•˜๊ณ  ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ๋ชจ๋‘ CPU๋กœ ์˜คํ”„๋กœ๋“œํ•˜๋ฉด 8GB VRAM ๋ฏธ๋งŒ์—์„œ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹จ์ ์€ ๋” ๋งŽ์€ ์‹œ์Šคํ…œ RAM(์•ฝ 25GB)์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ถ”๊ฐ€ ๊ตฌ์„ฑ ์˜ต์…˜์€ [DeepSpeed ๋ฌธ์„œ](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.
๋˜ํ•œ ๊ธฐ๋ณธ Adam ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ DeepSpeed์˜ ์ตœ์ ํ™”๋œ Adam ๋ฒ„์ „์œผ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” ์ƒ๋‹นํ•œ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ Adam์ธ [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu)์ž…๋‹ˆ๋‹ค.
`DeepSpeedCPUAdam`์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ์‹œ์Šคํ…œ์˜ CUDA toolchain ๋ฒ„์ „์ด PyTorch์™€ ํ•จ๊ป˜ ์„ค์น˜๋œ ๊ฒƒ๊ณผ ๋™์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €๋Š” ํ˜„์žฌ DeepSpeed์™€ ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ ๋ช…๋ น์œผ๋กœ ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:
```bash
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--sample_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--mixed_precision=fp16
```
## ์ถ”๋ก 
๋ชจ๋ธ์„ ํ•™์Šตํ•œ ํ›„์—๋Š”, ๋ชจ๋ธ์ด ์ €์žฅ๋œ ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•ด [`StableDiffusionPipeline`]๋กœ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ์— ํ•™์Šต์— ์‚ฌ์šฉ๋œ ํŠน์ˆ˜ `์‹๋ณ„์ž`(์ด์ „ ์˜ˆ์‹œ์˜ `sks`)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.
**`"accelerate>=0.16.0"`**์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
from diffusers import StableDiffusionPipeline
import torch
model_id = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("dog-bucket.png")
```
[์ €์žฅ๋œ ํ•™์Šต ์ฒดํฌํฌ์ธํŠธ](#inference-from-a-saved-checkpoint)์—์„œ๋„ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.