BK-Lee (Byung-Kwan Lee)

upvoted a paper about 6 hours ago

FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published 8 days ago • 10

upvoted 3 papers 9 days ago

upvoted 3 papers 20 days ago

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published 25 days ago • 25

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published 24 days ago • 47

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published 23 days ago • 83

upvoted a paper 23 days ago

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published 25 days ago • 46

upvoted a paper 25 days ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 25 days ago • 96

upvoted 2 papers 26 days ago

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published 26 days ago • 16

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published 27 days ago • 27

upvoted a paper 27 days ago

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 72

upvoted a paper 30 days ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 69

upvoted a paper about 1 month ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published Sep 4 • 72

upvoted 2 papers about 2 months ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 52

SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models

Paper • 2408.12114 • Published Aug 22 • 11

upvoted 9 papers 2 months ago

ShortCircuit: AlphaZero-Driven Circuit Design

Paper • 2408.09858 • Published Aug 19 • 16

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 20

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 75

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

Paper • 2408.04840 • Published Aug 9 • 31

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

Learning to Predict Program Execution by Modeling Dynamic Dependency on Code Graphs

Paper • 2408.02816 • Published Aug 5 • 4

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

upvoted 7 papers 3 months ago

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

Paper • 2408.00765 • Published Aug 1 • 12

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 105

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 155

MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11 • 30

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

Paper • 2407.03958 • Published Jul 4 • 18

upvoted 7 papers 4 months ago

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published Jun 24 • 19

TroL: Traversal of Layers for Large Language and Vision Models

Paper • 2406.12246 • Published Jun 18 • 34

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 91

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

Phased Consistency Model

Paper • 2405.18407 • Published May 28 • 46

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5 • 17

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71

upvoted 6 papers 5 months ago

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Paper • 2405.07518 • Published May 13 • 24

SUTRA: Scalable Multilingual Language Model Architecture

Paper • 2405.06694 • Published May 7 • 37

Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory

Paper • 2405.08707 • Published May 14 • 27

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted 3 papers 6 months ago

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

Paper • 2404.08540 • Published Apr 12 • 10

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12 • 34

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

upvoted 11 papers 7 months ago

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Paper • 2403.14624 • Published Mar 21 • 50

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 64

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 104

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 25

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 44

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 32

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9

Larimar: Large Language Models with Episodic Memory Control

Paper • 2403.11901 • Published Mar 18 • 31

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Paper • 2403.07508 • Published Mar 12 • 75

upvoted a paper 8 months ago

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Byung-Kwan Lee

AI & ML interests

Organizations

BK-Lee's activity