# CascadeV | An Implemention of Würstchen architecture for High-Resolution Video Generation
## News
**[2024.07.17]** We release the [code](https://github.com/bytedance/CascadeV) and pretrained [weights](https://huggingface.co/ByteDance/CascadeV) of a DiT-based video VAE, which supports video reconstruction with a high compression factor (1x32x32=1024). The T2V model is still on the way.
## Introduction
CascadeV is a video generation pipeline built upon the [Würstchen](https://openreview.net/forum?id=gU58d5QeGv) architecture. By using a highly compressed latent representation, we can generate longer videos with higher resolution.
## Video VAE
Comparison of Our Cascade Approach with Other VAEs (on Latent Space of Shape 8x32x32)
Video Recontruction: Original (left) vs. Reconstructed (right) | *Click to view the videos*
![]() |
![]() |
![]() |
![]() |