Spaces:
Running
on
Zero
Running
on
Zero
<!--Copyright 2024 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# InstructPix2Pix | |
[InstructPix2Pix](https://arxiv.org/abs/2211.09800)λ text-conditioned diffusion λͺ¨λΈμ΄ ν μ΄λ―Έμ§μ νΈμ§μ λ°λ₯Ό μ μλλ‘ νμΈνλνλ λ°©λ²μ λλ€. μ΄ λ°©λ²μ μ¬μ©νμ¬ νμΈνλλ λͺ¨λΈμ λ€μμ μ λ ₯μΌλ‘ μ¬μ©ν©λλ€: | |
<p align="center"> | |
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png" alt="instructpix2pix-inputs" width=600/> | |
</p> | |
μΆλ ₯μ μ λ ₯ μ΄λ―Έμ§μ νΈμ§ μ§μκ° λ°μλ "μμ λ" μ΄λ―Έμ§μ λλ€: | |
<p align="center"> | |
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/output-gs%407-igs%401-steps%4050.png" alt="instructpix2pix-output" width=600/> | |
</p> | |
`train_instruct_pix2pix.py` μ€ν¬λ¦½νΈ([μ¬κΈ°](https://github.com/huggingface/diffusers/blob/main/examples/instruct_pix2pix/train_instruct_pix2pix.py)μμ μ°Ύμ μ μμ΅λλ€.)λ νμ΅ μ μ°¨λ₯Ό μ€λͺ νκ³ Stable Diffusionμ μ μ©ν μ μλ λ°©λ²μ 보μ¬μ€λλ€. | |
*** `train_instruct_pix2pix.py`λ [μλ ꡬν](https://github.com/timothybrooks/instruct-pix2pix)μ μΆ©μ€νλ©΄μ InstructPix2Pix νμ΅ μ μ°¨λ₯Ό ꡬννκ³ μμ§λ§, [μκ·λͺ¨ λ°μ΄ν°μ ](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)μμλ§ ν μ€νΈλ₯Ό νμ΅λλ€. μ΄λ μ΅μ’ κ²°κ³Όμ μν₯μ λΌμΉ μ μμ΅λλ€. λ λμ κ²°κ³Όλ₯Ό μν΄, λ ν° λ°μ΄ν°μ μμ λ κΈΈκ² νμ΅νλ κ²μ κΆμ₯ν©λλ€. [μ¬κΈ°](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)μμ InstructPix2Pix νμ΅μ μν΄ ν° λ°μ΄ν°μ μ μ°Ύμ μ μμ΅λλ€. | |
*** | |
## PyTorchλ‘ λ‘컬μμ μ€ννκΈ° | |
### μ’ μμ±(dependencies) μ€μΉνκΈ° | |
μ΄ μ€ν¬λ¦½νΈλ₯Ό μ€ννκΈ° μ μ, λΌμ΄λΈλ¬λ¦¬μ νμ΅ μ’ μμ±μ μ€μΉνμΈμ: | |
**μ€μ** | |
μ΅μ λ²μ μ μμ μ€ν¬λ¦½νΈλ₯Ό μ±κ³΅μ μΌλ‘ μ€ννκΈ° μν΄, **μλ³ΈμΌλ‘λΆν° μ€μΉ**νλ κ²κ³Ό μμ μ€ν¬λ¦½νΈλ₯Ό μμ£Ό μ λ°μ΄νΈνκ³ μμ λ³ μꡬμ¬νμ μ€μΉνκΈ° λλ¬Έμ μ΅μ μνλ‘ μ μ§νλ κ²μ κΆμ₯ν©λλ€. μ΄λ₯Ό μν΄, μλ‘μ΄ κ°μ νκ²½μμ λ€μ μ€ν μ μ€ννμΈμ: | |
```bash | |
git clone https://github.com/huggingface/diffusers | |
cd diffusers | |
pip install -e . | |
``` | |
cd λͺ λ Ήμ΄λ‘ μμ ν΄λλ‘ μ΄λνμΈμ. | |
```bash | |
cd examples/instruct_pix2pix | |
``` | |
μ΄μ μ€ννμΈμ. | |
```bash | |
pip install -r requirements.txt | |
``` | |
κ·Έλ¦¬κ³ [π€Accelerate](https://github.com/huggingface/accelerate/) νκ²½μμ μ΄κΈ°ννμΈμ: | |
```bash | |
accelerate config | |
``` | |
νΉμ νκ²½μ λν μ§λ¬Έ μμ΄ κΈ°λ³Έμ μΈ accelerate ꡬμ±μ μ¬μ©νλ €λ©΄ λ€μμ μ€ννμΈμ. | |
```bash | |
accelerate config default | |
``` | |
νΉμ μ¬μ© μ€μΈ νκ²½μ΄ notebookκ³Ό κ°μ λνν μμ μ§μνμ§ μλ κ²½μ°λ λ€μ μ μ°¨λ₯Ό λ°λΌμ£ΌμΈμ. | |
```python | |
from accelerate.utils import write_basic_config | |
write_basic_config() | |
``` | |
### μμ | |
μ΄μ μ μΈκΈνλ―μ΄, νμ΅μ μν΄ [μμ λ°μ΄ν°μ ](https://huggingface.co/datasets/fusing/instructpix2pix-1000-samples)μ μ¬μ©ν κ²μ λλ€. κ·Έ λ°μ΄ν°μ μ InstructPix2Pix λ Όλ¬Έμμ μ¬μ©λ [μλμ λ°μ΄ν°μ ](https://huggingface.co/datasets/timbrooks/instructpix2pix-clip-filtered)λ³΄λ€ μμ λ²μ μ λλ€. μμ μ λ°μ΄ν°μ μ μ¬μ©νκΈ° μν΄, [νμ΅μ μν λ°μ΄ν°μ λ§λ€κΈ°](create_dataset) κ°μ΄λλ₯Ό μ°Έκ³ νμΈμ. | |
`MODEL_NAME` νκ²½ λ³μ(νλΈ λͺ¨λΈ λ ν¬μ§ν 리 λλ λͺ¨λΈ κ°μ€μΉκ° ν¬ν¨λ ν΄λ κ²½λ‘)λ₯Ό μ§μ νκ³ [`pretrained_model_name_or_path`](https://huggingface.co/docs/diffusers/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained.pretrained_model_name_or_path) μΈμμ μ λ¬ν©λλ€. `DATASET_ID`μ λ°μ΄ν°μ μ΄λ¦μ μ§μ ν΄μΌ ν©λλ€: | |
```bash | |
export MODEL_NAME="runwayml/stable-diffusion-v1-5" | |
export DATASET_ID="fusing/instructpix2pix-1000-samples" | |
``` | |
μ§κΈ, νμ΅μ μ€νν μ μμ΅λλ€. μ€ν¬λ¦½νΈλ λ ν¬μ§ν 리μ νμ ν΄λμ λͺ¨λ ꡬμ±μμ(`feature_extractor`, `scheduler`, `text_encoder`, `unet` λ±)λ₯Ό μ μ₯ν©λλ€. | |
```bash | |
accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--dataset_name=$DATASET_ID \ | |
--enable_xformers_memory_efficient_attention \ | |
--resolution=256 --random_flip \ | |
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
--max_train_steps=15000 \ | |
--checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ | |
--conditioning_dropout_prob=0.05 \ | |
--mixed_precision=fp16 \ | |
--seed=42 \ | |
--push_to_hub | |
``` | |
μΆκ°μ μΌλ‘, κ°μ€μΉμ λ°μ΄μ΄μ€λ₯Ό νμ΅ κ³Όμ μ λͺ¨λν°λ§νμ¬ κ²μ¦ μΆλ‘ μ μννλ κ²μ μ§μν©λλ€. `report_to="wandb"`μ μ΄ κΈ°λ₯μ μ¬μ©ν μ μμ΅λλ€: | |
```bash | |
accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \ | |
--pretrained_model_name_or_path=$MODEL_NAME \ | |
--dataset_name=$DATASET_ID \ | |
--enable_xformers_memory_efficient_attention \ | |
--resolution=256 --random_flip \ | |
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
--max_train_steps=15000 \ | |
--checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0 \ | |
--conditioning_dropout_prob=0.05 \ | |
--mixed_precision=fp16 \ | |
--val_image_url="https://hf.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" \ | |
--validation_prompt="make the mountains snowy" \ | |
--seed=42 \ | |
--report_to=wandb \ | |
--push_to_hub | |
``` | |
λͺ¨λΈ λλ²κΉ μ μ μ©ν μ΄ νκ° λ°©λ² κΆμ₯ν©λλ€. μ΄λ₯Ό μ¬μ©νκΈ° μν΄ `wandb`λ₯Ό μ€μΉνλ κ²μ μ£Όλͺ©ν΄μ£ΌμΈμ. `pip install wandb`λ‘ μ€νν΄ `wandb`λ₯Ό μ€μΉν μ μμ΅λλ€. | |
[μ¬κΈ°](https://wandb.ai/sayakpaul/instruct-pix2pix/runs/ctr3kovq), λͺ κ°μ§ νκ° λ°©λ²κ³Ό νμ΅ νλΌλ―Έν°λ₯Ό ν¬ν¨νλ μμλ₯Ό λ³Ό μ μμ΅λλ€. | |
***μ°Έκ³ : μλ³Έ λ Όλ¬Έμμ, μ μλ€μ 256x256 μ΄λ―Έμ§ ν΄μλλ‘ νμ΅ν λͺ¨λΈλ‘ 512x512μ κ°μ λ ν° ν΄μλλ‘ μ μΌλ°νλλ κ²μ λ³Ό μ μμμ΅λλ€. μ΄λ νμ΅μ μ¬μ©ν ν° λ°μ΄ν°μ μ μ¬μ©νκΈ° λλ¬Έμ λλ€.*** | |
## λ€μμ GPUλ‘ νμ΅νκΈ° | |
`accelerate`λ μνν λ€μμ GPUλ‘ νμ΅μ κ°λ₯νκ² ν©λλ€. `accelerate`λ‘ λΆμ° νμ΅μ μ€ννλ [μ¬κΈ°](https://huggingface.co/docs/accelerate/basic_tutorials/launch) μ€λͺ μ λ°λΌ ν΄ μ£ΌμκΈ° λ°λλλ€. μμμ λͺ λ Ήμ΄ μ λλ€: | |
```bash | |
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \ | |
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \ | |
--dataset_name=sayakpaul/instructpix2pix-1000-samples \ | |
--use_ema \ | |
--enable_xformers_memory_efficient_attention \ | |
--resolution=512 --random_flip \ | |
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \ | |
--max_train_steps=15000 \ | |
--checkpointing_steps=5000 --checkpoints_total_limit=1 \ | |
--learning_rate=5e-05 --lr_warmup_steps=0 \ | |
--conditioning_dropout_prob=0.05 \ | |
--mixed_precision=fp16 \ | |
--seed=42 \ | |
--push_to_hub | |
``` | |
## μΆλ‘ νκΈ° | |
μΌλ¨ νμ΅μ΄ μλ£λλ©΄, μΆλ‘ ν μ μμ΅λλ€: | |
```python | |
import PIL | |
import requests | |
import torch | |
from diffusers import StableDiffusionInstructPix2PixPipeline | |
model_id = "your_model_id" # <- μ΄λ₯Ό μμ νμΈμ. | |
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda") | |
generator = torch.Generator("cuda").manual_seed(0) | |
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/test_pix2pix_4.png" | |
def download_image(url): | |
image = PIL.Image.open(requests.get(url, stream=True).raw) | |
image = PIL.ImageOps.exif_transpose(image) | |
image = image.convert("RGB") | |
return image | |
image = download_image(url) | |
prompt = "wipe out the lake" | |
num_inference_steps = 20 | |
image_guidance_scale = 1.5 | |
guidance_scale = 10 | |
edited_image = pipe( | |
prompt, | |
image=image, | |
num_inference_steps=num_inference_steps, | |
image_guidance_scale=image_guidance_scale, | |
guidance_scale=guidance_scale, | |
generator=generator, | |
).images[0] | |
edited_image.save("edited_image.png") | |
``` | |
νμ΅ μ€ν¬λ¦½νΈλ₯Ό μ¬μ©ν΄ μ»μ μμμ λͺ¨λΈ λ ν¬μ§ν 리λ μ¬κΈ° [sayakpaul/instruct-pix2pix](https://huggingface.co/sayakpaul/instruct-pix2pix)μμ νμΈν μ μμ΅λλ€. | |
μ±λ₯μ μν μλμ νμ§μ μ μ΄νκΈ° μν΄ μΈ κ°μ§ νλΌλ―Έν°λ₯Ό μ¬μ©νλ κ²μ΄ μ’μ΅λλ€: | |
* `num_inference_steps` | |
* `image_guidance_scale` | |
* `guidance_scale` | |
νΉν, `image_guidance_scale`μ `guidance_scale`λ μμ±λ("μμ λ") μ΄λ―Έμ§μμ ν° μν₯μ λ―ΈμΉ μ μμ΅λλ€.([μ¬κΈ°](https://twitter.com/RisingSayak/status/1628392199196151808?s=20)μμλ₯Ό μ°Έκ³ ν΄μ£ΌμΈμ.) | |
λ§μ½ InstructPix2Pix νμ΅ λ°©λ²μ μ¬μ©ν΄ λͺ κ°μ§ ν₯λ―Έλ‘μ΄ λ°©λ²μ μ°Ύκ³ μλ€λ©΄, μ΄ λΈλ‘κ·Έ κ²μλ¬Ό[Instruction-tuning Stable Diffusion with InstructPix2Pix](https://huggingface.co/blog/instruction-tuning-sd)μ νμΈν΄μ£ΌμΈμ. |