-
Analyzing The Language of Visual Tokens
Paper • 2411.05001 • Published • 22 -
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Paper • 2411.14982 • Published • 15 -
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration
Paper • 2411.17686 • Published • 18
Jaehyun Jun
btjhjeon
AI & ML interests
Multimodal
Recent Activity
upvoted
a
paper
3 days ago
FastVLM: Efficient Vision Encoding for Vision Language Models
upvoted
a
paper
3 days ago
Descriptive Caption Enhancement with Visual Specialists for Multimodal
Perception
updated
a collection
3 days ago
Multimodal LLM
Organizations
Collections
8
-
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Paper • 2410.17637 • Published • 34 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 64 -
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper • 2411.18203 • Published • 30 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 20
models
None public yet
datasets
None public yet