DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 9 days ago • 18
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4, 2024 • 25
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published Nov 7, 2024 • 14
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5, 2024 • 25
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25, 2024 • 23
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models Paper • 2410.18505 • Published Oct 24, 2024 • 10
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 22
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8, 2024 • 14
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8, 2024 • 19
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments Paper • 2410.05254 • Published Oct 7, 2024 • 80
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10, 2024 • 24
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models Paper • 2410.07133 • Published Oct 9, 2024 • 19
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 50
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14, 2024 • 25
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 54