Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 13 days ago • 81
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Paper • 2407.14482 • Published Jul 19, 2024 • 26
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn Paper • 2306.08640 • Published Jun 14, 2023 • 26