license: apache-2.0
Kandinsky Video — a new text-to-video generation model
SoTA quality among open-source solutions
This repository is the official implementation of Kandinsky Video model
Paper | Project | | Telegram-bot | Habr post
Kandinsky Video is a text-to-video generation model, which is based on the FusionFrames architecture, consisting of two main stages: keyframe generation and interpolation. Our approach for temporal conditioning allows us to generate videos with high-quality appearance, smoothness and dynamics.
Pipeline
The encoded text prompt enters the U-Net keyframe generation model with temporal layers or blocks, and then the sampled latent keyframes are sent to the latent interpolation model in such a way as to predict three interpolation frames between two keyframes. A temporal MoVQ-GAN decoder is used to get the final video result.
Architecture details
- Text encoder (Flan-UL2) - 8.6B
- Latent Diffusion U-Net3D - 4.0B
- MoVQ encoder/decoder - 256M
How to use
Check our jupyter notebooks with examples in ./examples
folder
1. text2video
from video_kandinsky3 import get_T2V_pipeline
t2v_pipe = get_T2V_pipeline('cuda', fp16=True)
pfps = 'medium' # ['low', 'medium', 'high']
video = t2v_pipe(
'a red car is drifting on the mountain road, close view, fast movement',
width=640, height=384, fps=fps
)
Results
"A car moving on the road from the sea to the mountains" | "A red car drifting, 4k video" | "Chemistry laboratory, chemical explosion, 4k" | "Erupting volcano raw power, molten lava, and the forces of the Earth" |
"Luminescent jellyfish swims underwater, neon, 4k" | "Majestic waterfalls in a lush rainforest power, mist, and biodiversity" | "White ghost flies through a night clearing, 4k" | "Wildlife migration herds on the move, crossing landscapes in harmony" |
"Majestic humpback whale breaching power, grace, and ocean spectacle" | "Evoke the sense of wonder in a time-lapse journey through changing seasons" | "Explore the fascinating world of underwater creatures in a visually stunning sequence" | "Polar ice caps the pristine wilderness of the Arctic and Antarctic" |
"Rolling waves on a sandy beach relaxation, rhythm, and coastal beauty" | "Sloth in slow motion deliberate movements, relaxation, and arboreal life" | "Time-lapse of a flower blooming growth, beauty, and the passage of time" | "Craft a heartwarming narrative showcasing the bond between a human and their loyal pet companion" |
Authors
- Vladimir Arkhipkin: Github, Google Scholar
- Zein Shaheen: Github, Google Scholar
- Viacheslav Vasilev: Github, Google Scholar
- Igor Pavlov: Github
- Elizaveta Dakhova: Github
- Anastasia Lysenko: Github
- Sergey Markov
- Denis Dimitrov: Github, Google Scholar
- Andrey Kuznetsov: Github, Google Scholar
BibTeX
If you use our work in your research, please cite our publication:
TBD