Papers
arxiv:2310.08465

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

Published on Oct 12, 2023
ยท Submitted by akhaliq on Oct 13, 2023

Abstract

Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse video generations. Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion. For example, generating a video with a car moving in a prescribed manner under specific camera movements to make a movie, or a video illustrating how a bear would lift weights to inspire creators. Adaptation methods have been developed for customizing appearance like subject or style, yet unexplored for motion. It is straightforward to extend mainstream adaption methods for motion customization, including full model tuning, parameter-efficient tuning of additional layers, and Low-Rank Adaptions (LoRAs). However, the motion concept learned by these methods is often coupled with the limited appearances in the training videos, making it difficult to generalize the customized motion to other appearances. To overcome this challenge, we propose MotionDirector, with a dual-path LoRAs architecture to decouple the learning of appearance and motion. Further, we design a novel appearance-debiased temporal loss to mitigate the influence of appearance on the temporal training objective. Experimental results show the proposed method can generate videos of diverse appearances for the customized motions. Our method also supports various downstream applications, such as the mixing of different videos with their appearance and motion respectively, and animating a single image with customized motions. Our code and model weights will be released.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Screen Shot 2023-10-14 at 1.10.31 PM.png
Had a question - in figure 4(above), what is the projection method used? Is it t-SNE, PCA or something else?

Screen Shot 2023-10-14 at 1.10.31 PM.png
Had a question - in figure 4(above), what is the projection method used? Is it t-SNE, PCA or something else?

Hi Meher @MeherShashwat . Thanks for your question. Yes, we used PCA to reduce the dimension of the latent codes to 2.

Thanks a lot for clarifying. I really like this work :)

Paper author

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2310.08465 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.08465 in a dataset README.md to link it from this page.

Spaces citing this paper 4

Collections including this paper 8