BleachNick's picture
upload required packages
87d40d2
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# InstructPix2Pix
[InstructPix2Pix](https://arxiv.org/abs/2211.09800)λŠ” text-conditioned diffusion λͺ¨λΈμ΄ ν•œ 이미지에 νŽΈμ§‘μ„ λ”°λ₯Ό 수 μžˆλ„λ‘ νŒŒμΈνŠœλ‹ν•˜λŠ” λ°©λ²•μž…λ‹ˆλ‹€. 이 방법을 μ‚¬μš©ν•˜μ—¬ νŒŒμΈνŠœλ‹λœ λͺ¨λΈμ€ λ‹€μŒμ„ μž…λ ₯으둜 μ‚¬μš©ν•©λ‹ˆλ‹€:
<p align="center">
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png" alt="instructpix2pix-inputs" width=600/>
</p>
좜λ ₯은 μž…λ ₯ 이미지에 νŽΈμ§‘ μ§€μ‹œκ°€ 반영된 "μˆ˜μ •λœ" μ΄λ―Έμ§€μž…λ‹ˆλ‹€:
<p align="center">
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/output-gs%407-igs%401-steps%4050.png" alt="instructpix2pix-output" width=600/>
</p>
`train_instruct_pix2pix.py` 슀크립트([μ—¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/train_instruct_pix2pix.py)μ—μ„œ 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.)λŠ” ν•™μŠ΅ 절차λ₯Ό μ„€λͺ…ν•˜κ³  Stable Diffusion에 μ μš©ν•  수 μžˆλŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€.
*** `train_instruct_pix2pix.py`λŠ” [μ›λž˜ κ΅¬ν˜„](https://github.com/timothybrooks/instruct-pix2pix)에 μΆ©μ‹€ν•˜λ©΄μ„œ InstructPix2Pix ν•™μŠ΅ 절차λ₯Ό κ΅¬ν˜„ν•˜κ³  μžˆμ§€λ§Œ, [μ†Œκ·œλͺ¨ 데이터셋](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)μ—μ„œλ§Œ ν…ŒμŠ€νŠΈλ₯Ό ν–ˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” μ΅œμ’… 결과에 영ν–₯을 끼칠 수 μžˆμŠ΅λ‹ˆλ‹€. 더 λ‚˜μ€ κ²°κ³Όλ₯Ό μœ„ν•΄, 더 큰 λ°μ΄ν„°μ…‹μ—μ„œ 더 길게 ν•™μŠ΅ν•˜λŠ” 것을 ꢌμž₯ν•©λ‹ˆλ‹€. [μ—¬κΈ°](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)μ—μ„œ InstructPix2Pix ν•™μŠ΅μ„ μœ„ν•΄ 큰 데이터셋을 찾을 수 μžˆμŠ΅λ‹ˆλ‹€.
***
## PyTorch둜 λ‘œμ»¬μ—μ„œ μ‹€ν–‰ν•˜κΈ°
### 쒅속성(dependencies) μ„€μΉ˜ν•˜κΈ°
이 슀크립트λ₯Ό μ‹€ν–‰ν•˜κΈ° 전에, 라이브러리의 ν•™μŠ΅ 쒅속성을 μ„€μΉ˜ν•˜μ„Έμš”:
**μ€‘μš”**
μ΅œμ‹  λ²„μ „μ˜ 예제 슀크립트λ₯Ό μ„±κ³΅μ μœΌλ‘œ μ‹€ν–‰ν•˜κΈ° μœ„ν•΄, **μ›λ³ΈμœΌλ‘œλΆ€ν„° μ„€μΉ˜**ν•˜λŠ” 것과 예제 슀크립트λ₯Ό 자주 μ—…λ°μ΄νŠΈν•˜κ³  μ˜ˆμ œλ³„ μš”κ΅¬μ‚¬ν•­μ„ μ„€μΉ˜ν•˜κΈ° λ•Œλ¬Έμ— μ΅œμ‹  μƒνƒœλ‘œ μœ μ§€ν•˜λŠ” 것을 ꢌμž₯ν•©λ‹ˆλ‹€. 이λ₯Ό μœ„ν•΄, μƒˆλ‘œμš΄ 가상 ν™˜κ²½μ—μ„œ λ‹€μŒ μŠ€ν…μ„ μ‹€ν–‰ν•˜μ„Έμš”:
```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```
cd λͺ…λ Ήμ–΄λ‘œ 예제 ν΄λ”λ‘œ μ΄λ™ν•˜μ„Έμš”.
```bash
cd examples/instruct_pix2pix
```
이제 μ‹€ν–‰ν•˜μ„Έμš”.
```bash
pip install -r requirements.txt
```
그리고 [πŸ€—Accelerate](https://github.com/huggingface/accelerate/) ν™˜κ²½μ—μ„œ μ΄ˆκΈ°ν™”ν•˜μ„Έμš”:
```bash
accelerate config
```
ν˜Ήμ€ ν™˜κ²½μ— λŒ€ν•œ 질문 없이 기본적인 accelerate ꡬ성을 μ‚¬μš©ν•˜λ €λ©΄ λ‹€μŒμ„ μ‹€ν–‰ν•˜μ„Έμš”.
```bash
accelerate config default
```
ν˜Ήμ€ μ‚¬μš© 쀑인 ν™˜κ²½μ΄ notebookκ³Ό 같은 λŒ€ν™”ν˜• μ‰˜μ€ μ§€μ›ν•˜μ§€ μ•ŠλŠ” κ²½μš°λŠ” λ‹€μŒ 절차λ₯Ό λ”°λΌμ£Όμ„Έμš”.
```python
from accelerate.utils import write_basic_config
write_basic_config()
```
### μ˜ˆμ‹œ
이전에 μ–ΈκΈ‰ν–ˆλ“―μ΄, ν•™μŠ΅μ„ μœ„ν•΄ [μž‘μ€ 데이터셋](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)을 μ‚¬μš©ν•  κ²ƒμž…λ‹ˆλ‹€. κ·Έ 데이터셋은 InstructPix2Pix λ…Όλ¬Έμ—μ„œ μ‚¬μš©λœ [μ›λž˜μ˜ 데이터셋](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)보닀 μž‘μ€ λ²„μ „μž…λ‹ˆλ‹€. μžμ‹ μ˜ 데이터셋을 μ‚¬μš©ν•˜κΈ° μœ„ν•΄, [ν•™μŠ΅μ„ μœ„ν•œ 데이터셋 λ§Œλ“€κΈ°](create_dataset) κ°€μ΄λ“œλ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
`MODEL_NAME` ν™˜κ²½ λ³€μˆ˜(ν—ˆλΈŒ λͺ¨λΈ λ ˆν¬μ§€ν† λ¦¬ λ˜λŠ” λͺ¨λΈ κ°€μ€‘μΉ˜κ°€ ν¬ν•¨λœ 폴더 경둜)λ₯Ό μ§€μ •ν•˜κ³  [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) μΈμˆ˜μ— μ „λ‹¬ν•©λ‹ˆλ‹€. `DATASET_ID`에 데이터셋 이름을 지정해야 ν•©λ‹ˆλ‹€:
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_ID="fusing/instructpix2pix-1000-samples"
```
μ§€κΈˆ, ν•™μŠ΅μ„ μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μŠ€ν¬λ¦½νŠΈλŠ” λ ˆν¬μ§€ν† λ¦¬μ˜ ν•˜μœ„ ν΄λ”μ˜ λͺ¨λ“  κ΅¬μ„±μš”μ†Œ(`feature_extractor`, `scheduler`, `text_encoder`, `unet` λ“±)λ₯Ό μ €μž₯ν•©λ‹ˆλ‹€.
```bash
accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_ID \
--enable_xformers_memory_efficient_attention \
--resolution=256 --random_flip \
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=15000 \
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \
--seed=42 \
--push_to_hub
```
μΆ”κ°€μ μœΌλ‘œ, κ°€μ€‘μΉ˜μ™€ λ°”μ΄μ–΄μŠ€λ₯Ό ν•™μŠ΅ 과정에 λͺ¨λ‹ˆν„°λ§ν•˜μ—¬ 검증 좔둠을 μˆ˜ν–‰ν•˜λŠ” 것을 μ§€μ›ν•©λ‹ˆλ‹€. `report_to="wandb"`와 이 κΈ°λŠ₯을 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
```bash
accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_ID \
--enable_xformers_memory_efficient_attention \
--resolution=256 --random_flip \
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=15000 \
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \
--validation_prompt="make the mountains snowy" \
--seed=42 \
--report_to=wandb \
--push_to_hub
```
λͺ¨λΈ 디버깅에 μœ μš©ν•œ 이 평가 방법 ꢌμž₯ν•©λ‹ˆλ‹€. 이λ₯Ό μ‚¬μš©ν•˜κΈ° μœ„ν•΄ `wandb`λ₯Ό μ„€μΉ˜ν•˜λŠ” 것을 μ£Όλͺ©ν•΄μ£Όμ„Έμš”. `pip install wandb`둜 μ‹€ν–‰ν•΄ `wandb`λ₯Ό μ„€μΉ˜ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
[μ—¬κΈ°](https://wandb.ai/sayakpaul/instruct-pix2pix/runs/ctr3kovq), λͺ‡ 가지 평가 방법과 ν•™μŠ΅ νŒŒλΌλ―Έν„°λ₯Ό ν¬ν•¨ν•˜λŠ” μ˜ˆμ‹œλ₯Ό λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.
***μ°Έκ³ : 원본 λ…Όλ¬Έμ—μ„œ, μ €μžλ“€μ€ 256x256 이미지 ν•΄μƒλ„λ‘œ ν•™μŠ΅ν•œ λͺ¨λΈλ‘œ 512x512와 같은 더 큰 ν•΄μƒλ„λ‘œ 잘 μΌλ°˜ν™”λ˜λŠ” 것을 λ³Ό 수 μžˆμ—ˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” ν•™μŠ΅μ— μ‚¬μš©ν•œ 큰 데이터셋을 μ‚¬μš©ν–ˆκΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.***
## λ‹€μˆ˜μ˜ GPU둜 ν•™μŠ΅ν•˜κΈ°
`accelerate`λŠ” μ›ν™œν•œ λ‹€μˆ˜μ˜ GPU둜 ν•™μŠ΅μ„ κ°€λŠ₯ν•˜κ²Œ ν•©λ‹ˆλ‹€. `accelerate`둜 λΆ„μ‚° ν•™μŠ΅μ„ μ‹€ν–‰ν•˜λŠ” [μ—¬κΈ°](https://huggingface.co/docs/accelerate/basic_tutorials/launch) μ„€λͺ…을 따라 ν•΄ μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€. μ˜ˆμ‹œμ˜ λͺ…λ Ήμ–΄ μž…λ‹ˆλ‹€:
```bash
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
--dataset_name=sayakpaul/instructpix2pix-1000-samples \
--use_ema \
--enable_xformers_memory_efficient_attention \
--resolution=512 --random_flip \
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
--max_train_steps=15000 \
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
--learning_rate=5e-05 --lr_warmup_steps=0 \
--conditioning_dropout_prob=0.05 \
--mixed_precision=fp16 \
--seed=42 \
--push_to_hub
```
## μΆ”λ‘ ν•˜κΈ°
일단 ν•™μŠ΅μ΄ μ™„λ£Œλ˜λ©΄, μΆ”λ‘  ν•  수 μžˆμŠ΅λ‹ˆλ‹€:
```python
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = "your_model_id" # <- 이λ₯Ό μˆ˜μ •ν•˜μ„Έμš”.
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
generator = torch.Generator("cuda").manual_seed(0)
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/test_pix2pix_4.png"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)
prompt = "wipe out the lake"
num_inference_steps = 20
image_guidance_scale = 1.5
guidance_scale = 10
edited_image = pipe(
prompt,
image=image,
num_inference_steps=num_inference_steps,
image_guidance_scale=image_guidance_scale,
guidance_scale=guidance_scale,
generator=generator,
).images[0]
edited_image.save("edited_image.png")
```
ν•™μŠ΅ 슀크립트λ₯Ό μ‚¬μš©ν•΄ 얻은 μ˜ˆμ‹œμ˜ λͺ¨λΈ λ ˆν¬μ§€ν† λ¦¬λŠ” μ—¬κΈ° [sayakpaul/instruct-pix2pix](https://huggingface.co/sayakpaul/instruct-pix2pix)μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
μ„±λŠ₯을 μœ„ν•œ 속도와 ν’ˆμ§ˆμ„ μ œμ–΄ν•˜κΈ° μœ„ν•΄ μ„Έ 가지 νŒŒλΌλ―Έν„°λ₯Ό μ‚¬μš©ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€:
* `num_inference_steps`
* `image_guidance_scale`
* `guidance_scale`
특히, `image_guidance_scale`와 `guidance_scale`λŠ” μƒμ„±λœ("μˆ˜μ •λœ") μ΄λ―Έμ§€μ—μ„œ 큰 영ν–₯을 λ―ΈμΉ  수 μžˆμŠ΅λ‹ˆλ‹€.([μ—¬κΈ°](https://twitter.com/RisingSayak/status/1628392199196151808?s=20)μ˜ˆμ‹œλ₯Ό μ°Έκ³ ν•΄μ£Όμ„Έμš”.)
λ§Œμ•½ InstructPix2Pix ν•™μŠ΅ 방법을 μ‚¬μš©ν•΄ λͺ‡ 가지 ν₯미둜운 방법을 μ°Ύκ³  μžˆλ‹€λ©΄, 이 λΈ”λ‘œκ·Έ κ²Œμ‹œλ¬Ό[Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd)을 ν™•μΈν•΄μ£Όμ„Έμš”.