-
Can LLMs Follow Simple Rules?
Paper • 2311.04235 • Published • 9 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 75 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 180 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Paper • 2310.08659 • Published • 20 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1 -
LoRA ensembles for large language model fine-tuning
Paper • 2310.00035 • Published • 2
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 16 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 35 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 36 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 19