maxin-cn/Latte · Hugging Face

Latte: Latent Diffusion Transformer for Video Generation

This repo contains pre-trained weights on FaceForensics, SkyTimelapse, UCF101, and Taichi-HD for our paper exploring latent diffusion models with transformers (Latte). You can find more visualizations on our project page. If you want to obtain text-to-video generation pre-trained weights, please refer to here.

News

(🔥 New) May. 23, 2024. 💥 Latte-1 for Text-to-video generation is released! You can download pre-trained model here. Latte-1 also supports Text-to-image generation, please run bash sample/t2i.sh.
(🔥 New) Mar. 20, 2024. 💥 An updated LatteT2V model is coming soon, stay tuned!
(🔥 New) Feb. 24, 2024. 💥 We are very grateful that researchers and developers like our work. We will continue to update our LatteT2V model, hoping that our efforts can help the community develop. Our Latte discord channel is created for discussions. Coders are welcome to contribute.
(🔥 New) Jan. 9, 2024. 💥 An updated LatteT2V model initialized with the PixArt-α is released, the checkpoint can be found here.
(🔥 New) Oct. 31, 2023. 💥 The training and inference code is released. All checkpoints (including FaceForensics, SkyTimelapse, UCF101, and Taichi-HD) can be found here. In addition, the LatteT2V inference code is provided.

Contact Us

Yaohui Wang: wangyaohui@pjlab.org.cn Xin Ma: xin.ma1@monash.edu

Citation

If you find this work useful for your research, please consider citing it.

@article{ma2024latte,
  title={Latte: Latent Diffusion Transformer for Video Generation},
  author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Liu, Ziwei and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
  journal={arXiv preprint arXiv:2401.03048},
  year={2024}
}

Acknowledgments

Latte has been greatly inspired by the following amazing works and teams: DiT and PixArt-α, we thank all the contributors for open-sourcing.

maxin-cn
/

Latte

Latte: Latent Diffusion Transformer for Video Generation

News

Contact Us

Citation

Acknowledgments

Spaces using maxin-cn/Latte 4

Collection including maxin-cn/Latte

Video Generation