Diffusers example

#2
by andrew-cartwheel - opened

This is almost certainly a user error - but when I run the provided diffusers example it produces a key error for the lora weights

KeyError: 'blocks.0.cross_attn.k_img.lora_A.weight'

Any thoughts?

Could it be from running the inference code against the 1.3B model? (The LoRA is trained on the 14B model)
Reference github issue: https://github.com/tdrussell/diffusion-pipe/issues/114

Good question! I don't think so - as I am using a pretty simple inference code that references the 14B

import numpy as np
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.utils import export_to_video, load_image


# Available models: Wan-AI/Wan2.1-I2V-14B-480P-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.load_lora_weights("motimalu/wan-flat-color-v2")
pipe.to("cuda")

pipe.enable_model_cpu_offload()

max_area = 480 * 832
prompt = (
    "flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, sakura miko, pink hair, cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry blossom, tree shade, wariza, looking up, falling petals, half-closed eyes, white sky, clouds,  live2d animation, upper body, high quality cinematic video of a woman sitting under a sakura tree. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid."
)
negative_prompt = "ugly, low quality, JPEG compression residue, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured"

output = pipe(prompt=prompt, negative_prompt=negative_prompt, height=480, width=720, num_frames=81, guidance_scale=5.0).frames[0]
export_to_video(output, "output_flat.mp4", fps=16)

I see, the LoRA is trained with diffusion-pipe and is dependant on diffusers supporting loading it as a ComfyUI-format lora.
Reference issue: https://github.com/tdrussell/diffusion-pipe/issues/135#issuecomment-2704411120

For the previews on the model card, I've generated with ComfyUI_examples/wan/#text-to-video
Loading the LoRA with LoraLoaderModelOnly node.

Checking another wan LoRA model card, the diffusers inference code sets the adapter_name="wan-lora" - so this might be missing? cc: @multimodalart

pipe.load_lora_weights("finetrainers/Wan2.1-T2V-1.3B-crush-smol-v0", adapter_name="wan-lora")
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment