PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22 • 45
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 93
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published Jun 17 • 61
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8 • 39