Diffusers example
This is almost certainly a user error - but when I run the provided diffusers example it produces a key error for the lora weights
KeyError: 'blocks.0.cross_attn.k_img.lora_A.weight'
Any thoughts?
Could it be from running the inference code against the 1.3B model? (The LoRA is trained on the 14B model)
Reference github issue: https://github.com/tdrussell/diffusion-pipe/issues/114
Good question! I don't think so - as I am using a pretty simple inference code that references the 14B
import numpy as np
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.utils import export_to_video, load_image
# Available models: Wan-AI/Wan2.1-I2V-14B-480P-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.load_lora_weights("motimalu/wan-flat-color-v2")
pipe.to("cuda")
pipe.enable_model_cpu_offload()
max_area = 480 * 832
prompt = (
"flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco], 1girl, sakura miko, pink hair, cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry blossom, tree shade, wariza, looking up, falling petals, half-closed eyes, white sky, clouds, live2d animation, upper body, high quality cinematic video of a woman sitting under a sakura tree. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid."
)
negative_prompt = "ugly, low quality, JPEG compression residue, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured"
output = pipe(prompt=prompt, negative_prompt=negative_prompt, height=480, width=720, num_frames=81, guidance_scale=5.0).frames[0]
export_to_video(output, "output_flat.mp4", fps=16)
I see, the LoRA is trained with diffusion-pipe and is dependant on diffusers supporting loading it as a ComfyUI-format lora.
Reference issue: https://github.com/tdrussell/diffusion-pipe/issues/135#issuecomment-2704411120
For the previews on the model card, I've generated with ComfyUI_examples/wan/#text-to-video
Loading the LoRA with LoraLoaderModelOnly node.
Checking another wan LoRA model card, the diffusers inference code sets the adapter_name="wan-lora"
- so this might be missing? cc:
@multimodalart
pipe.load_lora_weights("finetrainers/Wan2.1-T2V-1.3B-crush-smol-v0", adapter_name="wan-lora")