Papers
arxiv:2412.11100

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Published on Dec 15
· Submitted by BrandonLiu on Dec 17
Authors:
,
,
,

Abstract

The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360{\deg} panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose the DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Our project page is available at https://dynamic-scaler.pages.dev/.

Community

Paper submitter

This paper focuses on 360-degree panoramic video generation, a crucial part of spatial intelligence for immersive dynamic scenes. Challenges in this area include difficulties in collecting complete 360-degree panorama video data, limitations in existing methods such as the generalization ability of models trained on small datasets (e.g., 360-degree DVDs) and the limited motion range in inversion strategies (like 4K4DGen), as well as the overlooked issue of achieving continuous and loopable scene-level dynamic effects. To overcome these, we propose DynamicScaler, a novel framework capable of generating high-resolution dynamic effects in infinite spatial dimensions and creating 360-degree dynamic panoramas. It supports text-conditioned and text-image-conditioned generation and can produce theoretically infinite-length or loopable motion effects without the need for training data. By integrating concepts from prior works, it offers a robust, training-free, and data-free solution that addresses data limitations and quality constraints in dynamic effect generation. Our exploration led to unexpected and significant breakthroughs. Notably, our framework can directly generate 360-degree panoramas from text, eliminating the need for large datasets of field-of-view video panoramas, which is a major advancement considering the challenges in obtaining such data and the quality loss in field-of-view transitions. Additionally, the shift mechanism allows for the generation of near-infinite or loopable scene-level dynamic effects, enhancing the immersiveness in AR/VR environments. Overall, our framework sets a better stage for 360-degree panorama generation, showing great potential in applications such as 3D gaming and film design for creating more immersive 4D spatial experiences.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.11100 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.11100 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.11100 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.