Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing Paper • 2411.19460 • Published Nov 29, 2024 • 10 • 2
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis Paper • 2411.16173 • Published Nov 25, 2024 • 7 • 2
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Paper • 2406.01920 • Published Jun 4, 2024 • 1 • 1