VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 48
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published May 31, 2024 • 22
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24, 2024 • 44
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework Paper • 2405.11143 • Published May 20, 2024 • 37