BleachNick's picture
upload required packages
87d40d2
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ControlNet
[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) (ControlNet)์€ Lvmin Zhang๊ณผ Maneesh Agrawala์— ์˜ํ•ด ์“ฐ์—ฌ์กŒ์Šต๋‹ˆ๋‹ค.
์ด ์˜ˆ์‹œ๋Š” [์›๋ณธ ControlNet ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ์˜ˆ์‹œ ํ•™์Šตํ•˜๊ธฐ](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md)์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค. ControlNet์€ ์›๋“ค์„ ์ฑ„์šฐ๊ธฐ ์œ„ํ•ด [small synthetic dataset](https://huggingface.co/datasets/fusing/fill50k)์„ ์‚ฌ์šฉํ•ด์„œ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค.
## ์˜์กด์„ฑ ์„ค์น˜ํ•˜๊ธฐ
์•„๋ž˜์˜ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์—, ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ํ•™์Šต ์˜์กด์„ฑ์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
<Tip warning={true}>
๊ฐ€์žฅ ์ตœ์‹  ๋ฒ„์ „์˜ ์˜ˆ์‹œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”, ์†Œ์Šค์—์„œ ์„ค์น˜ํ•˜๊ณ  ์ตœ์‹  ๋ฒ„์ „์˜ ์„ค์น˜๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์„ ๊ฐ•๋ ฅํ•˜๊ฒŒ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์˜ˆ์‹œ ์Šคํฌ๋ฆฝํŠธ๋“ค์„ ์ž์ฃผ ์—…๋ฐ์ดํŠธํ•˜๊ณ  ์˜ˆ์‹œ์— ๋งž์ถ˜ ํŠน์ •ํ•œ ์š”๊ตฌ์‚ฌํ•ญ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
</Tip>
์œ„ ์‚ฌํ•ญ์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ, ์ƒˆ๋กœ์šด ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ ๋‹ค์Œ ์ผ๋ จ์˜ ์Šคํ…์„ ์‹คํ–‰ํ•˜์„ธ์š”:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```
๊ทธ ๋‹ค์Œ์—๋Š” [์˜ˆ์‹œ ํด๋”](https://github.com/huggingface/diffusers/tree/main/examples/controlnet)์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
```bash
cd examples/controlnet
```
์ด์ œ ์‹คํ–‰ํ•˜์„ธ์š”:
```bash
pip install -r requirements.txt
```
[๐Ÿค—Accelerate](https://github.com/huggingface/accelerate/) ํ™˜๊ฒฝ์„ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค:
```bash
accelerate config
```
ํ˜น์€ ์—ฌ๋Ÿฌ๋ถ„์˜ ํ™˜๊ฒฝ์ด ๋ฌด์—‡์ธ์ง€ ๋ชฐ๋ผ๋„ ๊ธฐ๋ณธ์ ์ธ ๐Ÿค—Accelerate ๊ตฌ์„ฑ์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
accelerate config default
```
ํ˜น์€ ๋‹น์‹ ์˜ ํ™˜๊ฒฝ์ด ๋…ธํŠธ๋ถ ๊ฐ™์€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ์‰˜์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด, ์•„๋ž˜์˜ ์ฝ”๋“œ๋กœ ์ดˆ๊ธฐํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
from accelerate.utils import write_basic_config
write_basic_config()
```
## ์›์„ ์ฑ„์šฐ๋Š” ๋ฐ์ดํ„ฐ์…‹
์›๋ณธ ๋ฐ์ดํ„ฐ์…‹์€ ControlNet [repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip)์— ์˜ฌ๋ผ์™€์žˆ์ง€๋งŒ, ์šฐ๋ฆฌ๋Š” [์—ฌ๊ธฐ](https://huggingface.co/datasets/fusing/fill50k)์— ์ƒˆ๋กญ๊ฒŒ ๋‹ค์‹œ ์˜ฌ๋ ค์„œ ๐Ÿค— Datasets ๊ณผ ํ˜ธํ™˜๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ ์ƒ์—์„œ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์šฐ๋ฆฌ์˜ ํ•™์Šต ์˜ˆ์‹œ๋Š” ์›๋ž˜ ControlNet์˜ ํ•™์Šต์— ์“ฐ์˜€๋˜ [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ControlNet์€ ๋Œ€์‘๋˜๋Š” ์–ด๋Š Stable Diffusion ๋ชจ๋ธ([`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) ํ˜น์€ [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1)์˜ ์ฆ๊ฐ€๋ฅผ ์œ„ํ•ด ํ•™์Šต๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ž์ฒด ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” [ํ•™์Šต์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑํ•˜๊ธฐ](create_dataset) ๊ฐ€์ด๋“œ๋ฅผ ํ™•์ธํ•˜์„ธ์š”.
## ํ•™์Šต
์ด ํ•™์Šต์— ์‚ฌ์šฉ๋  ๋‹ค์Œ ์ด๋ฏธ์ง€๋“ค์„ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”:
```sh
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png
wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png
```
`MODEL_NAME` ํ™˜๊ฒฝ ๋ณ€์ˆ˜ (Hub ๋ชจ๋ธ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ ์•„์ด๋”” ํ˜น์€ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ๋Š” ๋””๋ ‰ํ† ๋ฆฌ๋กœ ๊ฐ€๋Š” ์ฃผ์†Œ)๋ฅผ ๋ช…์‹œํ•˜๊ณ  [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) ์ธ์ž๋กœ ํ™˜๊ฒฝ๋ณ€์ˆ˜๋ฅผ ๋ณด๋ƒ…๋‹ˆ๋‹ค.
ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ๋‹น์‹ ์˜ ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์— `diffusion_pytorch_model.bin` ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ  ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=4 \
--push_to_hub
```
์ด ๊ธฐ๋ณธ์ ์ธ ์„ค์ •์œผ๋กœ๋Š” ~38GB VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ณธ์ ์œผ๋กœ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ…์„œ๋ณด๋“œ์— ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์ค‘์น˜(weight)์™€ ํŽธํ–ฅ(bias)์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด `--report_to wandb` ๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
๋” ์ž‘์€ batch(๋ฐฐ์น˜) ํฌ๊ธฐ๋กœ gradient accumulation(๊ธฐ์šธ๊ธฐ ๋ˆ„์ )์„ ํ•˜๋ฉด ํ•™์Šต ์š”๊ตฌ์‚ฌํ•ญ์„ ~20 GB VRAM์œผ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--push_to_hub
```
## ์—ฌ๋Ÿฌ๊ฐœ GPU๋กœ ํ•™์Šตํ•˜๊ธฐ
`accelerate` ์€ seamless multi-GPU ํ•™์Šต์„ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค. `accelerate`๊ณผ ํ•จ๊ป˜ ๋ถ„์‚ฐ๋œ ํ•™์Šต์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด [์—ฌ๊ธฐ](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
์˜ ์„ค๋ช…์„ ํ™•์ธํ•˜์„ธ์š”. ์•„๋ž˜๋Š” ์˜ˆ์‹œ ๋ช…๋ น์–ด์ž…๋‹ˆ๋‹ค:
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=4 \
--mixed_precision="fp16" \
--tracker_project_name="controlnet-demo" \
--report_to=wandb \
--push_to_hub
```
## ์˜ˆ์‹œ ๊ฒฐ๊ณผ
#### ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ 8๋กœ 300 ์Šคํ… ์ดํ›„:
| | |
|-------------------|:-------------------------:|
| | ํ‘ธ๋ฅธ ๋ฐฐ๊ฒฝ๊ณผ ๋นจ๊ฐ„ ์› |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![ํ‘ธ๋ฅธ ๋ฐฐ๊ฒฝ๊ณผ ๋นจ๊ฐ„ ์›](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_300_steps.png) |
| | ๊ฐˆ์ƒ‰ ๊ฝƒ ๋ฐฐ๊ฒฝ๊ณผ ์ฒญ๋ก์ƒ‰ ์› |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![๊ฐˆ์ƒ‰ ๊ฝƒ ๋ฐฐ๊ฒฝ๊ณผ ์ฒญ๋ก์ƒ‰ ์›](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_300_steps.png) |
#### ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ 8๋กœ 6000 ์Šคํ… ์ดํ›„:
| | |
|-------------------|:-------------------------:|
| | ํ‘ธ๋ฅธ ๋ฐฐ๊ฒฝ๊ณผ ๋นจ๊ฐ„ ์› |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![ํ‘ธ๋ฅธ ๋ฐฐ๊ฒฝ๊ณผ ๋นจ๊ฐ„ ์›](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_6000_steps.png) |
| | ๊ฐˆ์ƒ‰ ๊ฝƒ ๋ฐฐ๊ฒฝ๊ณผ ์ฒญ๋ก์ƒ‰ ์› |
![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![๊ฐˆ์ƒ‰ ๊ฝƒ ๋ฐฐ๊ฒฝ๊ณผ ์ฒญ๋ก์ƒ‰ ์›](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_6000_steps.png) |
## 16GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ
16GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ์˜ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์„ธ์š”:
- ๊ธฐ์šธ๊ธฐ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ธฐ
- bitsandbyte์˜ [8-bit optimizer](https://github.com/TimDettmers/bitsandbytes#requirements--installation)๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ๋งํฌ์— ์—ฐ๊ฒฐ๋œ ์„ค๋ช…์„œ๋ฅผ ๋ณด์„ธ์š”.
์ด์ œ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--use_8bit_adam \
--push_to_hub
```
## 12GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ
12GB GPU์—์„œ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ์˜ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์„ธ์š”:
- ๊ธฐ์šธ๊ธฐ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ธฐ
- bitsandbyte์˜ 8-bit [optimizer](https://github.com/TimDettmers/bitsandbytes#requirements--installation)(๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ๋งํฌ์— ์—ฐ๊ฒฐ๋œ ์„ค๋ช…์„œ๋ฅผ ๋ณด์„ธ์š”)
- [xFormers](https://huggingface.co/docs/diffusers/training/optimization/xformers)(๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ๋งํฌ์— ์—ฐ๊ฒฐ๋œ ์„ค๋ช…์„œ๋ฅผ ๋ณด์„ธ์š”)
- ๊ธฐ์šธ๊ธฐ๋ฅผ `None`์œผ๋กœ ์„ค์ •
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--learning_rate=1e-5 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--push_to_hub
```
`pip install xformers`์œผ๋กœ `xformers`์„ ํ™•์‹คํžˆ ์„ค์น˜ํ•˜๊ณ  `enable_xformers_memory_efficient_attention`์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
## 8GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ
์šฐ๋ฆฌ๋Š” ControlNet์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ DeepSpeed๋ฅผ ์ฒ ์ €ํ•˜๊ฒŒ ํ…Œ์ŠคํŠธํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ํ™˜๊ฒฝ์„ค์ •์ด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ €์žฅํ•  ๋•Œ,
๊ทธ ํ™˜๊ฒฝ์ด ์„ฑ๊ณต์ ์œผ๋กœ ํ•™์Šตํ–ˆ๋Š”์ง€๋ฅผ ํ™•์ •ํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ์„ฑ๊ณตํ•œ ํ•™์Šต ์‹คํ–‰์„ ์œ„ํ•ด ์„ค์ •์„ ๋ณ€๊ฒฝํ•ด์•ผ ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์Šต๋‹ˆ๋‹ค.
8GB GPU์—์„œ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ์˜ ์ตœ์ ํ™”๋ฅผ ์ง„ํ–‰ํ•˜์„ธ์š”:
- ๊ธฐ์šธ๊ธฐ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ธฐ
- bitsandbyte์˜ 8-bit [optimizer](https://github.com/TimDettmers/bitsandbytes#requirements--installation)(๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ๋งํฌ์— ์—ฐ๊ฒฐ๋œ ์„ค๋ช…์„œ๋ฅผ ๋ณด์„ธ์š”)
- [xFormers](https://huggingface.co/docs/diffusers/training/optimization/xformers)(๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด ๋งํฌ์— ์—ฐ๊ฒฐ๋œ ์„ค๋ช…์„œ๋ฅผ ๋ณด์„ธ์š”)
- ๊ธฐ์šธ๊ธฐ๋ฅผ `None`์œผ๋กœ ์„ค์ •
- DeepSpeed stage 2 ๋ณ€์ˆ˜์™€ optimizer ์—†์—๊ธฐ
- fp16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„(precision)
[DeepSpeed](https://www.deepspeed.ai/)๋Š” CPU ๋˜๋Š” NVME๋กœ ํ…์„œ๋ฅผ VRAM์—์„œ ์˜คํ”„๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด์„œ ํ›จ์”ฌ ๋” ๋งŽ์€ RAM(์•ฝ 25 GB)๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
DeepSpeed stage 2๋ฅผ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ `accelerate config`๋กœ ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
๊ตฌ์„ฑ(configuration) ํŒŒ์ผ์€ ์ด๋Ÿฐ ๋ชจ์Šต์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
```yaml
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 4
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
```
<ํŒ>
[๋ฌธ์„œ](https://huggingface.co/docs/accelerate/usage_guides/deepspeed)๋ฅผ ๋” ๋งŽ์€ DeepSpeed ์„ค์ • ์˜ต์…˜์„ ์œ„ํ•ด ๋ณด์„ธ์š”.
<ํŒ>
๊ธฐ๋ณธ Adam optimizer๋ฅผ DeepSpeed'์˜ Adam
`deepspeed.ops.adam.DeepSpeedCPUAdam` ์œผ๋กœ ๋ฐ”๊พธ๋ฉด ์ƒ๋‹นํ•œ ์†๋„ ํ–ฅ์ƒ์„ ์ด๋ฃฐ์ˆ˜ ์žˆ์ง€๋งŒ,
Pytorch์™€ ๊ฐ™์€ ๋ฒ„์ „์˜ CUDA toolchain์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. 8-๋น„ํŠธ optimizer๋Š” ํ˜„์žฌ DeepSpeed์™€
ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
```bash
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="path to save model"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_DIR \
--output_dir=$OUTPUT_DIR \
--dataset_name=fusing/fill50k \
--resolution=512 \
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--mixed_precision fp16 \
--push_to_hub
```
## ์ถ”๋ก 
ํ•™์Šต๋œ ๋ชจ๋ธ์€ [`StableDiffusionControlNetPipeline`]๊ณผ ํ•จ๊ป˜ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
`base_model_path`์™€ `controlnet_path` ์— ๊ฐ’์„ ์ง€์ •ํ•˜์„ธ์š” `--pretrained_model_name_or_path` ์™€
`--output_dir` ๋Š” ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๊ฐœ๋ณ„์ ์œผ๋กœ ์ง€์ •๋ฉ๋‹ˆ๋‹ค.
```py
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import torch
base_model_path = "path to model"
controlnet_path = "path to controlnet"
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
base_model_path, controlnet=controlnet, torch_dtype=torch.float16
)
# ๋” ๋น ๋ฅธ ์Šค์ผ€์ค„๋Ÿฌ์™€ ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”๋กœ diffusion ํ”„๋กœ์„ธ์Šค ์†๋„ ์˜ฌ๋ฆฌ๊ธฐ
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# xformers๊ฐ€ ์„ค์น˜๋˜์ง€ ์•Š์œผ๋ฉด ์•„๋ž˜ ์ค„์„ ์‚ญ์ œํ•˜๊ธฐ
pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
control_image = load_image("./conditioning_image_1.png")
prompt = "pale golden rod circle with old lace background"
# ์ด๋ฏธ์ง€ ์ƒ์„ฑํ•˜๊ธฐ
generator = torch.manual_seed(0)
image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0]
image.save("./output.png")
```