quant - a smpanaro Collection

smpanaro 's Collections

Apple Neural Engine LLMs

quant

prune

quant

updated Oct 20

SqueezeLLM: Dense-and-Sparse Quantization

Paper • 2306.07629 • Published Jun 13, 2023 • 4
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

Paper • 2309.02784 • Published Sep 6, 2023 • 1
Extreme Compression of Large Language Models via Additive Quantization

Paper • 2401.06118 • Published Jan 11 • 12
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6 • 48
OneBit: Towards Extremely Low-bit Large Language Models

Paper • 2402.11295 • Published Feb 17 • 23
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models

Paper • 2402.14866 • Published Feb 21
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Paper • 2403.02775 • Published Mar 5 • 11
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23 • 19
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Paper • 2403.07134 • Published Mar 11
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Paper • 2306.02272 • Published Jun 4, 2023
QuantEase: Optimization-based Quantization for Language Models

Paper • 2309.01885 • Published Sep 5, 2023 • 4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 69
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

Paper • 2404.12759 • Published Apr 19
SpinQuant: LLM quantization with learned rotations

Paper • 2405.16406 • Published May 26 • 1
Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

Paper • 2406.01721 • Published Jun 3
Attention-aware Post-training Quantization without Backpropagation

Paper • 2406.13474 • Published Jun 19 • 1
Accuracy is Not All You Need

Paper • 2407.09141 • Published Jul 12 • 1
FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published Oct 12 • 12