Submitted by yulunliu 48 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing · 6 authors 2
Submitted by akhaliq 46 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing · 7 authors 3
Submitted by myownskyW7 39 MotionClone: Training-Free Motion Cloning for Controllable Video Generation · 9 authors 4
Submitted by yixinsong 35 PowerInfer-2: Fast Large Language Model Inference on a Smartphone · 6 authors 5
Submitted by Liuff23 32 Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion · 6 authors 3
Submitted by lixin4ever 30 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · 11 authors 2
Submitted by jedyang97 27 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination · 7 authors 2
Submitted by akhaliq 23 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos · 14 authors
Submitted by yixinsong 21 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters · 7 authors 2
Submitted by GlyphByT5 17 FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation · 8 authors
Submitted by akhaliq 13 Hierarchical Patch Diffusion Models for High-Resolution Video Generation · 4 authors
Submitted by akhaliq 13 AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation · 5 authors
Submitted by chrlu 12 Discovering Preference Optimization Algorithms with and for Large Language Models · 7 authors
Submitted by yifanzhang114 10 Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models · 7 authors 2
Submitted by AliBehrouz 7 Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models · 3 authors 1
Submitted by chrisliu298 6 Large Language Model Unlearning via Embedding-Corrupted Prompts · 4 authors