Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 3 days ago • 19
Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published 4 days ago • 12
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 5 days ago • 12
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers Paper • 2412.12571 • Published 6 days ago • 7
Learning from Massive Human Videos for Universal Humanoid Pose Control Paper • 2412.14172 • Published 4 days ago • 10
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published 4 days ago • 17
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes Paper • 2412.11100 • Published 8 days ago • 5
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator Paper • 2412.12094 • Published 6 days ago • 9
Whisper-GPT: A Hybrid Representation Audio Large Language Model Paper • 2412.11449 • Published 7 days ago • 4
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning Paper • 2412.10447 • Published 11 days ago • 5
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published 10 days ago • 43
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published 20 days ago • 21
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published 21 days ago • 10
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published 17 days ago • 16
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published 19 days ago • 12
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published 20 days ago • 18
Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning Paper • 2412.01408 • Published 20 days ago • 1