Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Paper • 2412.03548 • Published Dec 4, 2024 • 17
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models Paper • 2412.03548 • Published Dec 4, 2024 • 17 • 2
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper • 2412.06531 • Published about 1 month ago • 71
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21 • 2
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Paper • 2402.04788 • Published Feb 7, 2024
The Best of Both Worlds: Toward an Honest and Helpful Large Language Model Paper • 2406.00380 • Published Jun 1, 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models Paper • 2406.18966 • Published Jun 27, 2024
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 50