Text-to-Video

Add Diffusers weights

#18
by a-r-r-o-w HF staff - opened

Thanks for the awesome work! This PR adds the Diffusers-format weights to complete the integration.

Diffusers PR: https://github.com/huggingface/diffusers/pull/10136

Here's the minimal inference code for testing:

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "tencent/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)
pipe.vae.enable_tiling()
pipe.to("cuda")

output = pipe(
    prompt="A cat walks on the grass, realistic",
    height=320,
    width=512,
    num_frames=61,
    num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)

Once the weights are merged, I will test everything on my end, and merge the PR in diffusers if all is good. Please do test on your end as well and let me know if any changes are required. Happy to help with anything, and thank you so much again for empowering the community with the best open source video model!

a-r-r-o-w changed pull request title from Upload folder using huggingface_hub to Add Diffusers weights

I tried this branch with the above code and it is running out of CUDA memory on RTX 6000 Ada GPU with 48GB of memory.

Is there a way to use Accelerate with this to spread out the model on 4 48GB GPUs. Thanks.

Thanks @ghunkins
This works!
Is there anyway to to use Accelerate with this to spread out the model on 4 48GB GPUs. Thanks.

@softwareweaver Yes, you can set device_map="balanced" on the pipeline (to shard all models on multiple GPUs), or device_map="auto" on the transformer (to just shard the transformer). It's also possible to pass a finegrained dictionary where you can specify which layer resides on which GPU.

Here's the relevant documentation:

Thanks @a-r-r-o-w

Adding device map to transformer gave me this error

transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16, revision="refs/pr/18", quantization_config=quantization_config, device_map="auto"
)

NotImplementedError: Currently, device_map is automatically inferred for quantized bitsandbytes models. Support for providing device_map as an input will be added in the future.

Adding device_map="balanced" to the pipeline only gave the following error when executing the pipeline

pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18",device_map="balanced")

File "/home/ash/miniconda3/envs/diffusers/lib/python3.11/site-packages/bitsandbytes/functional.py", line 1989, in gemv_4bit
is_on_gpu([B, A, out, absmax, state.code])
File "/home/ash/miniconda3/envs/diffusers/lib/python3.11/site-packages/bitsandbytes/functional.py", line 469, in is_on_gpu
raise RuntimeError(
RuntimeError: Input tensors need to be on the same GPU, but found the following tensor and device combinations:
[(torch.Size([393216, 1]), device(type='cuda', index=0)), (torch.Size([1, 256]), device(type='cuda', index=3)), (torch.Size([1, 3072]), device(type='cuda', index=3)), (torch.Size([12288]), device(type='cuda', index=0)), (torch.Size([16]), device(type='cuda', index=0))]

RuntimeError: Failed to import diffusers.pipelines.hunyuan_video.pipeline_hunyuan_video because of the following error (look up to see its traceback):
Failed to import diffusers.models.autoencoders.autoencoder_kl_hunyuan_video because of the following error (look up to see its traceback):
'NoneType' object has no attribute 'start

@ahuang1900 You need to upgrade the diffusers version to 0.32.0 or install from the main branch

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment