VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper โข 2411.04923 โข Published Nov 7, 2024 โข 21
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark Paper โข 2410.18976 โข Published Oct 24, 2024 โข 12