Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03507

Can LLMs Follow Simple Rules?

Paper • 2311.04235 • Published Nov 6, 2023 • 9
The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 75
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 180
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Paper • 2310.08659 • Published Oct 12, 2023 • 20
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 43
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

Paper • 2309.16119 • Published Sep 28, 2023 • 1
LoRA ensembles for large language model fine-tuning

Paper • 2310.00035 • Published Sep 29, 2023 • 2

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 16
LLM Augmented LLMs: Expanding Capabilities through Composition

Paper • 2401.02412 • Published Jan 4 • 35
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11 • 36
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16 • 19

Previous
1
...
4
5
6
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs