LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 51
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper • 2409.04429 • Published Sep 6, 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14, 2024 • 9
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25, 2024 • 19
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning Paper • 2007.11622 • Published Jul 22, 2020
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper • 2005.14187 • Published May 28, 2020 • 2
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware Paper • 1812.00332 • Published Dec 2, 2018
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper • 2310.17752 • Published Oct 26, 2023 • 12