vila-molmo (VILA / Molmo)

Ligeng-Zhu

authored 6 papers 3 months ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

Paper • 2409.04429 • Published Sep 6, 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Paper • 2410.10629 • Published Oct 14, 2024 • 11

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Paper • 2410.19313 • Published Oct 25, 2024 • 19

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

Paper • 2007.11622 • Published Jul 22, 2020

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

cydhsieh01

authored a paper 3 months ago

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

Paper • 2412.03548 • Published Dec 4, 2024 • 17

t1101675

authored a paper 3 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

cydhsieh01

updated a model 4 months ago

vila-molmo/molmo-dense-captioner-v22-qwen2

Updated Nov 25, 2024 • 27

t1101675

authored 2 papers 5 months ago

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published Oct 22, 2024 • 15

Data Selection via Optimal Control for Language Models

Paper • 2410.07064 • Published Oct 9, 2024 • 8

Ligeng-Zhu

authored 2 papers 8 months ago

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32

$VILA^2$: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 40

cydhsieh01

authored a paper 8 months ago

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Paper • 2406.16008 • Published Jun 23, 2024 • 6

t1101675

authored a paper 9 months ago

Direct Preference Knowledge Distillation for Large Language Models

Paper • 2406.19774 • Published Jun 28, 2024 • 22

cydhsieh01

authored 5 papers 9 months ago

On the (In)fidelity and Sensitivity for Explanations

Paper • 1901.09392 • Published Jan 27, 2019

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Paper • 2306.14610 • Published Jun 26, 2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Paper • 2305.02301 • Published May 3, 2023 • 4

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Paper • 2308.00675 • Published Aug 1, 2023 • 36

A Survey on Programmatic Weak Supervision

Paper • 2202.05433 • Published Feb 11, 2022

VILA / Molmo

AI & ML interests

vila-molmo's activity

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

NVILA: Efficient Frontier Visual Language Models

Perception Tokens Enhance Visual Reasoning in Multimodal Language Models

NVILA: Efficient Frontier Visual Language Models

vila-molmo/molmo-dense-captioner-v22-qwen2

MiniPLM: Knowledge Distillation for Pre-Training Language Models

Data Selection via Optimal Control for Language Models

Wolf: Captioning Everything with a World Summarization Framework

$VILA^2$: VILA Augmented VILA

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Direct Preference Knowledge Distillation for Large Language Models

On the (In)fidelity and Sensitivity for Explanations

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

A Survey on Programmatic Weak Supervision

AI & ML interests

Team members 5

vila-molmo's activity