PaSa: An LLM Agent for Comprehensive Academic Paper Search Paper • 2501.10120 • Published 7 days ago • 37
On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis Paper • 2501.04377 • Published 16 days ago • 13
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper • 2501.02955 • Published 18 days ago • 40
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published 25 days ago • 36
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 72
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published Oct 29, 2024 • 13
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published Oct 24, 2024 • 36
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21, 2024 • 66
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 51
TokenCompose: Grounding Diffusion with Token-level Supervision Paper • 2312.03626 • Published Dec 6, 2023 • 5
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23, 2024 • 22
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17, 2024 • 22
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18, 2024 • 32
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published Jun 12, 2024 • 24
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20, 2024 • 29
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2, 2024 • 54