M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought Paper • 2405.16473 • Published May 26, 2024
Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement Paper • 2406.17233 • Published Jun 25, 2024
A Two-Stage Framework with Self-Supervised Distillation For Cross-Domain Text Classification Paper • 2304.09820 • Published Apr 18, 2023
Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding Paper • 2112.11953 • Published Dec 22, 2021
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published Dec 8, 2024 • 13
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published Dec 8, 2024 • 13
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published Dec 8, 2024 • 13 • 2
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 58
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 76
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Paper • 2407.13623 • Published Jul 18, 2024 • 53
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 33
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 40