BleachNick's picture
upload required packages
87d40d2

A newer version of the Gradio SDK is available: 5.9.1

Upgrade

ํšจ๊ณผ์ ์ด๊ณ  ํšจ์œจ์ ์ธ Diffusion

[[open-in-colab]]

ํŠน์ • ์Šคํƒ€์ผ๋กœ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ์›ํ•˜๋Š” ๋‚ด์šฉ์„ ํฌํ•จํ•˜๋„๋ก[DiffusionPipeline]์„ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์€ ๊นŒ๋‹ค๋กœ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ข…์ข… ๋งŒ์กฑ์Šค๋Ÿฌ์šด ์ด๋ฏธ์ง€๋ฅผ ์–ป๊ธฐ๊นŒ์ง€ [DiffusionPipeline]์„ ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฌด์—์„œ ์œ ๋ฅผ ์ฐฝ์กฐํ•˜๋Š” ๊ฒƒ์€ ํŠนํžˆ ์ถ”๋ก ์„ ๋ฐ˜๋ณตํ•ด์„œ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ํ”„๋กœ์„ธ์Šค์ž…๋‹ˆ๋‹ค.

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ๊ณ„์‚ฐ(์†๋„) ๋ฐ ๋ฉ”๋ชจ๋ฆฌ(GPU RAM) ํšจ์œจ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜์—ฌ ์ถ”๋ก  ์ฃผ๊ธฐ ์‚ฌ์ด์˜ ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•˜์—ฌ ๋” ๋น ๋ฅด๊ฒŒ ๋ฐ˜๋ณตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” [DiffusionPipeline]์„ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋น ๋ฅด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•ˆ๋‚ดํ•ฉ๋‹ˆ๋‹ค.

runwayml/stable-diffusion-v1-5 ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:

from diffusers import DiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(model_id)

์˜ˆ์ œ ํ”„๋กฌํ”„ํŠธ๋Š” "portrait of an old warrior chief" ์ด์ง€๋งŒ, ์ž์œ ๋กญ๊ฒŒ ์ž์‹ ๋งŒ์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค:

prompt = "portrait photo of a old warrior chief"

์†๋„

๐Ÿ’ก GPU์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์€ GPU ์ œ๊ณต์—…์ฒด์—์„œ ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!. Colab

์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ด๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” Pytorch ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•  ๋•Œ์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ GPU์— ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ฐฐ์น˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:

pipeline = pipeline.to("cuda")

๋™์ผํ•œ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋ ค๋ฉด Generator๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žฌํ˜„์„ฑ์— ๋Œ€ํ•œ ์‹œ๋“œ๋ฅผ ์„ค์ •ํ•˜์„ธ์š”:

import torch

generator = torch.Generator("cuda").manual_seed(0)

์ด์ œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

image = pipeline(prompt, generator=generator).images[0]
image

์ด ํ”„๋กœ์„ธ์Šค๋Š” T4 GPU์—์„œ ์•ฝ 30์ดˆ๊ฐ€ ์†Œ์š”๋˜์—ˆ์Šต๋‹ˆ๋‹ค(ํ• ๋‹น๋œ GPU๊ฐ€ T4๋ณด๋‹ค ๋‚˜์€ ๊ฒฝ์šฐ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ์Œ). ๊ธฐ๋ณธ์ ์œผ๋กœ [DiffusionPipeline]์€ 50๊ฐœ์˜ ์ถ”๋ก  ๋‹จ๊ณ„์— ๋Œ€ํ•ด ์ „์ฒด float32 ์ •๋ฐ€๋„๋กœ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. float16๊ณผ ๊ฐ™์€ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ์ „ํ™˜ํ•˜๊ฑฐ๋‚˜ ์ถ”๋ก  ๋‹จ๊ณ„๋ฅผ ๋” ์ ๊ฒŒ ์‹คํ–‰ํ•˜์—ฌ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

float16์œผ๋กœ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

import torch

pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")
generator = torch.Generator("cuda").manual_seed(0)
image = pipeline(prompt, generator=generator).images[0]
image

์ด๋ฒˆ์—๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์•ฝ 11์ดˆ๋ฐ–์— ๊ฑธ๋ฆฌ์ง€ ์•Š์•„ ์ด์ „๋ณด๋‹ค 3๋ฐฐ ๊ฐ€๊นŒ์ด ๋นจ๋ผ์กŒ์Šต๋‹ˆ๋‹ค!

๐Ÿ’ก ํŒŒ์ดํ”„๋ผ์ธ์€ ํ•ญ์ƒ float16์—์„œ ์‹คํ–‰ํ•  ๊ฒƒ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•˜๋ฉฐ, ์ง€๊ธˆ๊นŒ์ง€ ์ถœ๋ ฅ ํ’ˆ์งˆ์ด ์ €ํ•˜๋˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์—†์—ˆ์Šต๋‹ˆ๋‹ค.

๋˜ ๋‹ค๋ฅธ ์˜ต์…˜์€ ์ถ”๋ก  ๋‹จ๊ณ„์˜ ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณด๋‹ค ํšจ์œจ์ ์ธ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์„ ํƒํ•˜๋ฉด ์ถœ๋ ฅ ํ’ˆ์งˆ ์ €ํ•˜ ์—†์ด ๋‹จ๊ณ„ ์ˆ˜๋ฅผ ์ค„์ด๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ ๋ชจ๋ธ๊ณผ ํ˜ธํ™˜๋˜๋Š” ์Šค์ผ€์ค„๋Ÿฌ๋Š” compatibles ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ [DiffusionPipeline]์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

pipeline.scheduler.compatibles
[
    diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
    diffusers.schedulers.scheduling_unipc_multistep.UniPCMultistepScheduler,
    diffusers.schedulers.scheduling_k_dpm_2_discrete.KDPM2DiscreteScheduler,
    diffusers.schedulers.scheduling_deis_multistep.DEISMultistepScheduler,
    diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
    diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
    diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
    diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler,
    diffusers.schedulers.scheduling_k_dpm_2_ancestral_discrete.KDPM2AncestralDiscreteScheduler,
    diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler,
    diffusers.schedulers.scheduling_pndm.PNDMScheduler,
    diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler,
    diffusers.schedulers.scheduling_ddim.DDIMScheduler,
]

Stable Diffusion ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์•ฝ 50๊ฐœ์˜ ์ถ”๋ก  ๋‹จ๊ณ„๊ฐ€ ํ•„์š”ํ•œ [PNDMScheduler]๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ์‚ฌ์šฉํ•˜์ง€๋งŒ, [DPMSolverMultistepScheduler]์™€ ๊ฐ™์ด ์„ฑ๋Šฅ์ด ๋” ๋›ฐ์–ด๋‚œ ์Šค์ผ€์ค„๋Ÿฌ๋Š” ์•ฝ 20๊ฐœ ๋˜๋Š” 25๊ฐœ์˜ ์ถ”๋ก  ๋‹จ๊ณ„๋งŒ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ๋กœ๋“œํ•˜๋ ค๋ฉด [ConfigMixin.from_config] ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

from diffusers import DPMSolverMultistepScheduler

pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

num_inference_steps๋ฅผ 20์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:

generator = torch.Generator("cuda").manual_seed(0)
image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0]
image

์ถ”๋ก ์‹œ๊ฐ„์„ 4์ดˆ๋กœ ๋‹จ์ถ•ํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค! โšก๏ธ

๋ฉ”๋ชจ๋ฆฌ

ํŒŒ์ดํ”„๋ผ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์˜ ๋˜ ๋‹ค๋ฅธ ํ•ต์‹ฌ์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ด๋Š” ๊ฒƒ์ธ๋ฐ, ์ดˆ๋‹น ์ƒ์„ฑ๋˜๋Š” ์ด๋ฏธ์ง€ ์ˆ˜๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋ ค๊ณ  ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ„์ ‘์ ์œผ๋กœ ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํ•œ ๋ฒˆ์— ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ์ด๋ฏธ์ง€ ์ˆ˜๋ฅผ ํ™•์ธํ•˜๋Š” ๊ฐ€์žฅ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ OutOfMemoryError(OOM)์ด ๋ฐœ์ƒํ•  ๋•Œ๊นŒ์ง€ ๋‹ค์–‘ํ•œ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์‹œ๋„ํ•ด ๋ณด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ ๋ชฉ๋ก๊ณผ Generators์—์„œ ์ด๋ฏธ์ง€ ๋ฐฐ์น˜๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒฝ์šฐ ์žฌ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๊ฐ Generator์— ์‹œ๋“œ๋ฅผ ํ• ๋‹นํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

def get_inputs(batch_size=1):
    generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]
    prompts = batch_size * [prompt]
    num_inference_steps = 20

    return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}

๋˜ํ•œ ๊ฐ ์ด๋ฏธ์ง€ ๋ฐฐ์น˜๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ธฐ๋Šฅ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค:

from PIL import Image


def image_grid(imgs, rows=2, cols=2):
    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

batch_size=4๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์†Œ๋น„ํ–ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค:

images = pipeline(**get_inputs(batch_size=4)).images
image_grid(images)

RAM์ด ๋” ๋งŽ์€ GPU๊ฐ€ ์•„๋‹ˆ๋ผ๋ฉด ์œ„์˜ ์ฝ”๋“œ์—์„œ OOM ์˜ค๋ฅ˜๊ฐ€ ๋ฐ˜ํ™˜๋˜์—ˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค! ๋Œ€๋ถ€๋ถ„์˜ ๋ฉ”๋ชจ๋ฆฌ๋Š” cross-attention ๋ ˆ์ด์–ด๊ฐ€ ์ฐจ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด ์ž‘์—…์„ ๋ฐฐ์น˜๋กœ ์‹คํ–‰ํ•˜๋Š” ๋Œ€์‹  ์ˆœ์ฐจ์ ์œผ๋กœ ์‹คํ–‰ํ•˜๋ฉด ์ƒ๋‹นํ•œ ์–‘์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ์„ฑํ•˜์—ฌ [~DiffusionPipeline.enable_attention_slicing] ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค:

pipeline.enable_attention_slicing()

์ด์ œ batch_size๋ฅผ 8๋กœ ๋Š˜๋ ค๋ณด์„ธ์š”!

images = pipeline(**get_inputs(batch_size=8)).images
image_grid(images, rows=2, cols=4)

์ด์ „์—๋Š” 4๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ฐฐ์น˜๋กœ ์ƒ์„ฑํ•  ์ˆ˜๋„ ์—†์—ˆ์ง€๋งŒ, ์ด์ œ๋Š” ์ด๋ฏธ์ง€๋‹น ์•ฝ 3.5์ดˆ ๋งŒ์— 8๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ฐฐ์น˜๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ์ด๋Š” ์•„๋งˆ๋„ ํ’ˆ์งˆ ์ €ํ•˜ ์—†์ด T4 GPU์—์„œ ๊ฐ€์žฅ ๋น ๋ฅธ ์†๋„์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ’ˆ์งˆ

์ง€๋‚œ ๋‘ ์„น์…˜์—์„œ๋Š” fp16์„ ์‚ฌ์šฉํ•˜์—ฌ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์†๋„๋ฅผ ์ตœ์ ํ™”ํ•˜๊ณ , ๋” ์„ฑ๋Šฅ์ด ์ข‹์€ ์Šค์ผ€์ค„๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ๋‹จ๊ณ„์˜ ์ˆ˜๋ฅผ ์ค„์ด๊ณ , attention slicing์„ ํ™œ์„ฑํ™”ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์› ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ง‘์ค‘์ ์œผ๋กœ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๋” ๋‚˜์€ ์ฒดํฌํฌ์ธํŠธ

๊ฐ€์žฅ ํ™•์‹คํ•œ ๋‹จ๊ณ„๋Š” ๋” ๋‚˜์€ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. Stable Diffusion ๋ชจ๋ธ์€ ์ข‹์€ ์ถœ๋ฐœ์ ์ด๋ฉฐ, ๊ณต์‹ ์ถœ์‹œ ์ดํ›„ ๋ช‡ ๊ฐ€์ง€ ๊ฐœ์„ ๋œ ๋ฒ„์ „๋„ ์ถœ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ตœ์‹  ๋ฒ„์ „์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ•ด์„œ ์ž๋™์œผ๋กœ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ์—ฌ์ „ํžˆ ๋‹ค์–‘ํ•œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ง์ ‘ ์‹คํ—˜ํ•ด๋ณด๊ณ , negative prompts ์‚ฌ์šฉ ๋“ฑ ์•ฝ๊ฐ„์˜ ์กฐ์‚ฌ๋ฅผ ํ†ตํ•ด ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ถ„์•ผ๊ฐ€ ์„ฑ์žฅํ•จ์— ๋”ฐ๋ผ ํŠน์ • ์Šคํƒ€์ผ์„ ์—ฐ์ถœํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •๋œ ๊ณ ํ’ˆ์งˆ ์ฒดํฌํฌ์ธํŠธ๊ฐ€ ์ ์  ๋” ๋งŽ์•„์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Hub์™€ Diffusers Gallery๋ฅผ ๋‘˜๋Ÿฌ๋ณด๊ณ  ๊ด€์‹ฌ ์žˆ๋Š” ๊ฒƒ์„ ์ฐพ์•„๋ณด์„ธ์š”!

๋” ๋‚˜์€ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์„ฑ ์š”์†Œ

ํ˜„์žฌ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ๊ต์ฒดํ•ด ๋ณผ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. Stability AI์˜ ์ตœ์‹  autodecoder๋ฅผ ํŒŒ์ดํ”„๋ผ์ธ์— ๋กœ๋“œํ•˜๊ณ  ๋ช‡ ๊ฐ€์ง€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to("cuda")
pipeline.vae = vae
images = pipeline(**get_inputs(batch_size=8)).images
image_grid(images, rows=2, cols=4)

๋” ๋‚˜์€ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง

์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋Š” prompt engineering์ด๋ผ๊ณ  ํ•  ์ •๋„๋กœ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ์‹œ ๊ณ ๋ คํ•ด์•ผ ํ•  ๋ช‡ ๊ฐ€์ง€ ์‚ฌํ•ญ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  • ์ƒ์„ฑํ•˜๋ ค๋Š” ์ด๋ฏธ์ง€ ๋˜๋Š” ์œ ์‚ฌํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์ธํ„ฐ๋„ท์— ์–ด๋–ป๊ฒŒ ์ €์žฅ๋˜์–ด ์žˆ๋Š”๊ฐ€?
  • ๋‚ด๊ฐ€ ์›ํ•˜๋Š” ์Šคํƒ€์ผ๋กœ ๋ชจ๋ธ์„ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•ด ์–ด๋–ค ์ถ”๊ฐ€ ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋Š”๊ฐ€?

์ด๋ฅผ ์—ผ๋‘์— ๋‘๊ณ  ์ƒ‰์ƒ๊ณผ ๋” ๋†’์€ ํ’ˆ์งˆ์˜ ๋””ํ…Œ์ผ์„ ํฌํ•จํ•˜๋„๋ก ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ฐœ์„ ํ•ด ๋ด…์‹œ๋‹ค:

prompt += ", tribal panther make up, blue on red, side profile, looking away, serious eyes"
prompt += " 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta"

์ƒˆ๋กœ์šด ํ”„๋กฌํ”„ํŠธ๋กœ ์ด๋ฏธ์ง€ ๋ฐฐ์น˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค:

images = pipeline(**get_inputs(batch_size=8)).images
image_grid(images, rows=2, cols=4)

๊ฝค ์ธ์ƒ์ ์ž…๋‹ˆ๋‹ค! 1์˜ ์‹œ๋“œ๋ฅผ ๊ฐ€์ง„ Generator์— ํ•ด๋‹นํ•˜๋Š” ๋‘ ๋ฒˆ์งธ ์ด๋ฏธ์ง€์— ํ”ผ์‚ฌ์ฒด์˜ ๋‚˜์ด์— ๋Œ€ํ•œ ํ…์ŠคํŠธ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์กฐ๊ธˆ ๋” ์กฐ์ •ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

prompts = [
    "portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",
    "portrait photo of a old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",
    "portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",
    "portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3  --beta --upbeta",
]

generator = [torch.Generator("cuda").manual_seed(1) for _ in range(len(prompts))]
images = pipeline(prompt=prompts, generator=generator, num_inference_steps=25).images
image_grid(images)

๋‹ค์Œ ๋‹จ๊ณ„

์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ๋Š” ๊ณ„์‚ฐ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ ๋†’์ด๊ณ  ์ƒ์„ฑ๋œ ์ถœ๋ ฅ์˜ ํ’ˆ์งˆ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด [DiffusionPipeline]์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์› ์Šต๋‹ˆ๋‹ค. ํŒŒ์ดํ”„๋ผ์ธ์„ ๋” ๋น ๋ฅด๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด ๋‹ค์Œ ๋ฆฌ์†Œ์Šค๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”:

  • PyTorch 2.0 ๋ฐ torch.compile์ด ์–ด๋–ป๊ฒŒ ์ถ”๋ก  ์†๋„๋ฅผ 5~300% ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ณด์„ธ์š”. A100 GPU์—์„œ๋Š” ์ถ”๋ก  ์†๋„๊ฐ€ ์ตœ๋Œ€ 50%๊นŒ์ง€ ๋นจ๋ผ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
  • PyTorch 2๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ, xFormers๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ธ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์€ PyTorch 1.13.1๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด ์†๋„๊ฐ€ ๋นจ๋ผ์ง€๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์†Œ๋น„๊ฐ€ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ์˜คํ”„๋กœ๋”ฉ๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์€ ์ด ๊ฐ€์ด๋“œ์—์„œ ๋‹ค๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.