LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper β’ 2408.10188 β’ Published Aug 19, 2024 β’ 51
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper β’ 2409.04429 β’ Published Sep 6, 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper β’ 2410.10629 β’ Published Oct 14, 2024 β’ 9
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper β’ 2410.19313 β’ Published Oct 25, 2024 β’ 19
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning Paper β’ 2007.11622 β’ Published Jul 22, 2020
NVILA: Efficient Frontier Visual Language Models Paper β’ 2412.04468 β’ Published Dec 5, 2024 β’ 57