Img2Img Optimizations

#4
by waterbear - opened

Thanks for building this!

I'm curious if you've tested img2img with this, and what sort of config is optimal. I've gotten it working, but required higher amount of steps (20 vs expected 8). I had to change the shift on the scheduler to 1-2 for it to not completely overwrite the image. Lower shift fixes output mostly, but just wondering about the performance side if there's any speedups to be had. I get about 10-15 seconds for 2 imgs at 768x768 on NVIDIA L4 AWS instance. If I could get down to half that it'd be golden, but the step count is obviously hurting and not sure how to work around it.

I mostly work with illustrations, not photo realistic, for context.

    pipe = StableDiffusion3Img2ImgPipeline.from_pretrained(
        "tensorart/stable-diffusion-3.5-medium-turbo",
        torch_dtype=torch.float16,
        cache_dir="weights",
    )

    pipe.scheduler = FlowMatchEulerDiscreteScheduler(
        num_train_timesteps=1000,
        shift=2.0,
        use_dynamic_shifting=False,
    )

    pipe = _pipe.to("cuda")
TensorArt Studios org

Our sd3.5m controlnet will come soon, it may meet your need.

Sign up or log in to comment