-
xLSTM: Extended Long Short-Term Memory
Paper ā¢ 2405.04517 ā¢ Published ā¢ 8 -
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper ā¢ 2405.05254 ā¢ Published ā¢ 8 -
Understanding the performance gap between online and offline alignment algorithms
Paper ā¢ 2405.08448 ā¢ Published ā¢ 11 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 111
Collections
Discover the best community collections!
Collections including paper arxiv:2405.09818
-
ViTAR: Vision Transformer with Any Resolution
Paper ā¢ 2403.18361 ā¢ Published ā¢ 48 -
BRAVE: Broadening the visual encoding of vision-language models
Paper ā¢ 2404.07204 ā¢ Published ā¢ 14 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper ā¢ 2404.15653 ā¢ Published ā¢ 25 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 111
-
The Unreasonable Ineffectiveness of the Deeper Layers
Paper ā¢ 2403.17887 ā¢ Published ā¢ 75 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper ā¢ 2404.02258 ā¢ Published ā¢ 102 -
ReFT: Representation Finetuning for Language Models
Paper ā¢ 2404.03592 ā¢ Published ā¢ 74 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper ā¢ 2404.03715 ā¢ Published ā¢ 58
-
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper ā¢ 2401.15947 ā¢ Published ā¢ 47 -
The (R)Evolution of Multimodal Large Language Models: A Survey
Paper ā¢ 2402.12451 ā¢ Published -
deepseek-ai/deepseek-vl-7b-base
Updated ā¢ 150 ā¢ 37 -
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Paper ā¢ 2405.11273 ā¢ Published ā¢ 17
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper ā¢ 2403.03507 ā¢ Published ā¢ 180 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper ā¢ 2402.03293 ā¢ Published ā¢ 4 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper ā¢ 2401.11316 ā¢ Published ā¢ 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper ā¢ 2405.12130 ā¢ Published ā¢ 44
-
Neural Network Diffusion
Paper ā¢ 2402.13144 ā¢ Published ā¢ 94 -
Genie: Generative Interactive Environments
Paper ā¢ 2402.15391 ā¢ Published ā¢ 68 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper ā¢ 2402.17177 ā¢ Published ā¢ 88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper ā¢ 2403.00522 ā¢ Published ā¢ 40
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 37 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 6 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 154 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 45
-
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper ā¢ 2309.14717 ā¢ Published ā¢ 43 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper ā¢ 2403.05525 ā¢ Published ā¢ 39 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 111