Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2405.09818

Paper Reading List

xLSTM: Extended Long Short-Term Memory

Paper • 2405.04517 • Published May 7 • 8
You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8 • 8
Understanding the performance gap between online and offline alignment algorithms

Paper • 2405.08448 • Published May 14 • 11
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 111

vision foundation modesl

vision foundation models

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27 • 48
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10 • 14
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 25
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 111

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 75
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2 • 102
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 74
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 58

Multimodal VLLM

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29 • 47
The (R)Evolution of Multimodal Large Language Models: A Survey

Paper • 2402.12451 • Published Feb 19
deepseek-ai/deepseek-vl-7b-base

Updated Mar 15 • 150 • 37
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Paper • 2405.11273 • Published May 18 • 17

Lora variations

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 180
Flora: Low-Rank Adapters Are Secretly Gradient Compressors

Paper • 2402.03293 • Published Feb 5 • 4
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation

Paper • 2401.11316 • Published Jan 20 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 44

Daily paper that worth reading in details later

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23 • 68
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 40

(3D) Foundation Models

3D-LFM: Lifting Foundation Model

Paper • 2312.11894 • Published Dec 19, 2023 • 13
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 111

Training & Architectures

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 37
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 6
Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 154
Mistral 7B

Paper • 2310.06825 • Published Oct 10, 2023 • 45

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 43
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8 • 39
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 111

Previous
1
2
3
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs