UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing Paper • 2411.16781 • Published 4 days ago • 7
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters Paper • 2411.18197 • Published 2 days ago • 9
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper • 2411.15139 • Published 7 days ago • 9
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published 2 days ago • 70
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 1 day ago • 31
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation Paper • 2411.17945 • Published 2 days ago • 21
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality Paper • 2411.15241 • Published 7 days ago • 3
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published 3 days ago • 8
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE Paper • 2411.16856 • Published 4 days ago • 7
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models Paper • 2411.17451 • Published 3 days ago • 9
SketchAgent: Language-Driven Sequential Sketch Generation Paper • 2411.17673 • Published 3 days ago • 14
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs Paper • 2411.15296 • Published 7 days ago • 18
Star Attention: Efficient LLM Inference over Long Sequences Paper • 2411.17116 • Published 3 days ago • 38
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published 3 days ago • 58
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Paper • 2411.12814 • Published 10 days ago • 20