LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19 • 42
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 31
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 56
In-Context Editing: Learning Knowledge from Self-Induced Distributions Paper • 2406.11194 • Published Jun 17 • 15
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17 • 20
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform Paper • 2405.03003 • Published May 5 • 4
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition Paper • 2405.14259 • Published May 23 • 1
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Paper • 1909.11942 • Published Sep 26, 2019 • 2
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 14
Layer-Condensed KV Cache for Efficient Inference of Large Language Models Paper • 2405.10637 • Published May 17 • 19
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Paper • 2403.17919 • Published Mar 26 • 16
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 53
Divide-or-Conquer? Which Part Should You Distill Your LLM? Paper • 2402.15000 • Published Feb 22 • 22
Canonical models Collection This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper • 2401.14196 • Published Jan 25 • 46
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5 • 40
YAYI 2: Multilingual Open-Source Large Language Models Paper • 2312.14862 • Published Dec 22, 2023 • 13
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 32
LLM-FP4: 4-Bit Floating-Point Quantized Transformers Paper • 2310.16836 • Published Oct 25, 2023 • 13