CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data Paper • 2404.15653 • Published Apr 24 • 25
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 44
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 26
LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models Paper • 2405.14477 • Published May 23 • 15
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23 • 5
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Paper • 2405.17991 • Published May 28 • 9
Jina CLIP: Your CLIP Model Is Also Your Text Retriever Paper • 2405.20204 • Published 30 days ago • 27
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Paper • 2406.00392 • Published 28 days ago • 12
Block Transformer: Global-to-Local Language Modeling for Fast Inference Paper • 2406.02657 • Published 25 days ago • 35
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs Paper • 2406.02886 • Published 24 days ago • 6
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms Paper • 2406.02900 • Published 24 days ago • 10
GenAI Arena: An Open Evaluation Platform for Generative Models Paper • 2406.04485 • Published 23 days ago • 19
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published 22 days ago • 23
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? Paper • 2406.04391 • Published 23 days ago • 6
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach Paper • 2406.04594 • Published 22 days ago • 4
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published 23 days ago • 46
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published 17 days ago • 19
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 16 days ago • 18
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding Paper • 2406.09297 • Published 16 days ago • 4
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression Paper • 2406.11430 • Published 12 days ago • 21
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 9 days ago • 27
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch Paper • 2406.14563 • Published 9 days ago • 30
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published 11 days ago • 34