Bo Li's picture

Bo Li

luodian

·

https://brianboli.com/

luodian

AI & ML interests

None yet

Recent Activity

liked a dataset 19 days ago

MAmmoTH-VL/MAmmoTH-VL-Instruct-12M

upvoted a paper 22 days ago

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

authored a paper 25 days ago

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

View all activity

Organizations

luodian's activity

upvoted a paper 22 days ago

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22, 2024 • 19

upvoted a paper 26 days ago

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

Paper • 2412.05237 • Published 28 days ago • 46

upvoted a paper about 1 month ago

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 16

upvoted 7 papers 3 months ago

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17, 2024 • 75

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published Oct 2, 2024 • 41

Contrastive Localized Language-Image Pre-Training

Paper • 2410.02746 • Published Oct 3, 2024 • 33

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3, 2024 • 36

Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models

Paper • 2410.02740 • Published Oct 3, 2024 • 52

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38

LLaVA-Critic: Learning to Evaluate Multimodal Models

Paper • 2410.02712 • Published Oct 3, 2024 • 35

upvoted a collection 3 months ago

LLaVA-OneVision

a model good at arbitrary types of visual input • 15 items • Updated Oct 5, 2024 • 20

upvoted a paper 5 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 59

upvoted a paper 6 months ago

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 33

upvoted a collection 6 months ago

LLaVA-Next-Interleave

7 items • Updated Oct 4, 2024 • 16

upvoted a paper 6 months ago

Long Context Transfer from Language to Vision

Paper • 2406.16852 • Published Jun 24, 2024 • 32

upvoted a collection 8 months ago

LLaVA-NeXT

Some powerful image models. • 10 items • Updated Oct 14, 2024 • 2

upvoted 2 collections 9 months ago

LMMs-Eval

Dataset Collection of LMMs-Eval • 36 items • Updated Oct 4, 2024 • 25

LLaVA-Video

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 6 items • Updated Oct 5, 2024 • 56

upvoted 2 papers about 1 year ago

OtterHD: A High-Resolution Multi-modality Model

Paper • 2311.04219 • Published Nov 7, 2023 • 31

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 34