squeeze-ai-lab commited on
Commit
f4e22b3
1 Parent(s): 3781daa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **SqueezeLLM** is a post-training quantization framework that incorporates a new method called Dense-and-Sparse Quantization to enable efficient LLM serving.
2
+
3
+ **TLDR:** Deploying LLMs is difficult due to their large memory size. This can be addressed with reduced precision quantization.
4
+ But a naive method hurts performance. We address this with a new Dense-and-Sparse Quantization method.
5
+ Dense-and-Sparse splits weight matrices into two components: A dense component that can be heavily quantized without affecting model performance,
6
+ as well as a sparse part that preserves sensitive and outlier parts of the weight matrices With this approach,
7
+ we are able to serve larger models with smaller memory footprint, the same latency, and yet higher accuracy and quality.
8
+ For more details please check out our [paper](https://arxiv.org/pdf/2306.07629.pdf).
9
+
10
+
11
+ ## Model description
12
+
13
+ 4-bit XGen-7B Base model with 8K sequence length quantized using SqueezeLLM.
14
+ More details on the quantization method can be found in the [paper](https://arxiv.org/pdf/2306.07629.pdf).
15
+ More detailed model descriptions can be found in the [link](Salesforce/xgen-7b-8k-base).
16
+
17
+
18
+ * **Base Model:** [XGen-7B-8K-Base](Salesforce/xgen-7b-8k-base) (by Salesforce AI Research)
19
+ * **Bitwidth:** 4-bit
20
+ * **Sparsity Level:** 0% (dense-only)
21
+
22
+ ## Links
23
+
24
+ * **Paper**: [https://arxiv.org/pdf/2306.07629.pdf](https://arxiv.org/pdf/2306.07629.pdf)
25
+ * **Code**: [https://github.com/SqueezeAILab/SqueezeLLM](https://github.com/SqueezeAILab/SqueezeLLM)
26
+
27
+
28
+ ---
29
+ license: other
30
+ ---