Diffusers example

by andrew-cartwheel - opened 2 days ago

2 days ago

This is almost certainly a user error - but when I run the provided diffusers example it produces a key error for the lora weights

KeyError: 'blocks.0.cross_attn.k_img.lora_A.weight'

Any thoughts?

motimalu

Owner 1 day ago

Could it be from running the inference code against the 1.3B model? (The LoRA is trained on the 14B model)
Reference github issue: https://github.com/tdrussell/diffusion-pipe/issues/114

andrew-cartwheel

about 24 hours ago

Good question! I don't think so - as I am using a pretty simple inference code that references the 14B

import numpy as np
from diffusers import AutoencoderKLWan, WanPipeline
from diffusers.utils import export_to_video, load_image


# Available models: Wan-AI/Wan2.1-I2V-14B-480P-Diffusers, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
model_id = "Wan-AI/Wan2.1-T2V-14B-Diffusers"
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained(model_id, vae=vae, torch_dtype=torch.bfloat16)
pipe.load_lora_weights("motimalu/wan-flat-color-v2")
pipe.to("cuda")

pipe.enable_model_cpu_offload()

max_area = 480 * 832
prompt = (
    "flat color, no lineart, blending, negative space, artist:[john kafka|ponsuke kaikai|hara id 21|yoneyama mai|fuzichoco],  1girl, sakura miko, pink hair, cowboy shot, white shirt, floral print, off shoulder, outdoors, cherry blossom, tree shade, wariza, looking up, falling petals, half-closed eyes, white sky, clouds,  live2d animation, upper body, high quality cinematic video of a woman sitting under a sakura tree. The Camera is steady, This is a cowboy shot. The animation is smooth and fluid."
)
negative_prompt = "ugly, low quality, JPEG compression residue, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured"

output = pipe(prompt=prompt, negative_prompt=negative_prompt, height=480, width=720, num_frames=81, guidance_scale=5.0).frames[0]
export_to_video(output, "output_flat.mp4", fps=16)

motimalu

Owner about 17 hours ago

I see, the LoRA is trained with diffusion-pipe and is dependant on diffusers supporting loading it as a ComfyUI-format lora.
Reference issue: https://github.com/tdrussell/diffusion-pipe/issues/135#issuecomment-2704411120

For the previews on the model card, I've generated with ComfyUI_examples/wan/#text-to-video
Loading the LoRA with LoraLoaderModelOnly node.

Checking another wan LoRA model card, the diffusers inference code sets the adapter_name="wan-lora" - so this might be missing? cc: @multimodalart

pipe.load_lora_weights("finetrainers/Wan2.1-T2V-1.3B-crush-smol-v0", adapter_name="wan-lora")

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment