Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,41 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
---
|
6 |
+
license: mit
|
7 |
+
---
|
8 |
+
|
9 |
+
## Latte: Latent Diffusion Transformer for Video Generation
|
10 |
+
|
11 |
+
This repo contains text-to-video generation pre-trained weights for our paper exploring latent diffusion models with transformers (Latte). You can find more visualizations on our [project page](https://maxin-cn.github.io/latte_project/).
|
12 |
+
|
13 |
+
## News
|
14 |
+
- (π₯ New) May. 23, 2024. π₯ The updated LatteT2V model is released at [here](https://huggingface.co/maxin-cn/Latte/blob/main/t2v_v20240523.pt). If you want to use the updated model to generate images directly, please make sure `video_length=1`, `enable_temporal_attentions=True` and `enable_vae_temporal_decoder=False` in [t2v_sample.yaml](configs/t2v/t2v_sample.yaml).
|
15 |
+
|
16 |
+
- (π₯ New) Mar. 20, 2024. π₯ An updated LatteT2V model is coming soon, stay tuned!
|
17 |
+
|
18 |
+
- (π₯ New) Feb. 24, 2024. π₯ We are very grateful that researchers and developers like our work. We will continue to update our LatteT2V model, hoping that our efforts can help the community develop. Our Latte [discord](https://discord.gg/RguYqhVU92) channel is created for discussions. Coders are welcome to contribute.
|
19 |
+
|
20 |
+
- (π₯ New) Jan. 9, 2024. π₯ An updated LatteT2V model initialized with the [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha) is released, the checkpoint can be found [here](https://huggingface.co/maxin-cn/Latte/resolve/main/t2v.pt?download=true).
|
21 |
+
|
22 |
+
- (π₯ New) Oct. 31, 2023. π₯ The training and inference code is released. All checkpoints (including FaceForensics, SkyTimelapse, UCF101, and Taichi-HD) can be found [here](https://huggingface.co/maxin-cn/Latte/tree/main). In addition, the LatteT2V inference code is provided.
|
23 |
+
|
24 |
+
## Contact Us
|
25 |
+
**Yaohui Wang**: [wangyaohui@pjlab.org.cn](mailto:wangyaohui@pjlab.org.cn)
|
26 |
+
**Xin Ma**: [xin.ma1@monash.edu](mailto:xin.ma1@monash.edu)
|
27 |
+
|
28 |
+
## Citation
|
29 |
+
If you find this work useful for your research, please consider citing it.
|
30 |
+
```bibtex
|
31 |
+
@article{ma2024latte,
|
32 |
+
title={Latte: Latent Diffusion Transformer for Video Generation},
|
33 |
+
author={Ma, Xin and Wang, Yaohui and Jia, Gengyun and Chen, Xinyuan and Liu, Ziwei and Li, Yuan-Fang and Chen, Cunjian and Qiao, Yu},
|
34 |
+
journal={arXiv preprint arXiv:2401.03048},
|
35 |
+
year={2024}
|
36 |
+
}
|
37 |
+
```
|
38 |
+
|
39 |
+
|
40 |
+
## Acknowledgments
|
41 |
+
Latte has been greatly inspired by the following amazing works and teams: [DiT](https://github.com/facebookresearch/DiT) and [PixArt-Ξ±](https://github.com/PixArt-alpha/PixArt-alpha), we thank all the contributors for open-sourcing.
|