2 204 7

Jaehyun Jun

btjhjeon

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Recent Activity

updated a collection about 20 hours ago

Multimodal Dataset

updated a collection about 20 hours ago

Multimodal LLM

upvoted a paper about 20 hours ago

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

View all activity

Organizations

btjhjeon's activity

upvoted a paper about 20 hours ago

URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics

Paper • 2501.04686 • Published 2 days ago • 40

upvoted 2 papers 2 days ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published 3 days ago • 38

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published 4 days ago • 37

upvoted a paper 3 days ago

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Paper • 2501.02976 • Published 4 days ago • 44

upvoted 2 papers 4 days ago

Virgo: A Preliminary Exploration on Reproducing o1-like MLLM

Paper • 2501.01904 • Published 7 days ago • 27

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published 7 days ago • 32

upvoted 2 papers 7 days ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published 11 days ago • 23

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 9 days ago • 91

upvoted 3 papers 10 days ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published 26 days ago • 51

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Paper • 2412.20070 • Published 13 days ago • 43

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 17 days ago • 65

upvoted 2 papers 14 days ago

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Paper • 2412.18072 • Published 18 days ago • 16

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

Paper • 2412.18176 • Published 18 days ago • 15

upvoted 2 papers 15 days ago

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Paper • 2412.18609 • Published 17 days ago • 15

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 17 days ago • 35

upvoted 3 papers 17 days ago

upvoted 2 papers 21 days ago

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 24 days ago • 13

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Paper • 2412.14233 • Published 23 days ago • 6