Thireus commited on
Commit
e0d32b5
1 Parent(s): 4784f92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -6,6 +6,16 @@ tags:
6
  ---
7
  ![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
8
 
 
 
 
 
 
 
 
 
 
 
9
  **This model is a 8bit quantization of Vicuna 13B.**
10
  - 13B parameters
11
  - Group size: 128
 
6
  ---
7
  ![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
8
 
9
+ Q. Why quantized in 8bit instead of 4bit?
10
+ A. In theory, a 8bit quantized model should provide slightly better perplexity (maybe not noticeable - To Be Evaluated...) over a 4bit quatized version. If your available GPU VRAM is over 15GB you may want to try this out.
11
+ Note that quatization in 8bit does not mean loading the model in 8bit precision. Loading your model in 8bit precision (--load-in-8bit) definitely comes with a non-linear quality (perplexity) degradation.
12
+
13
+ Refs:
14
+ - https://github.com/ggerganov/llama.cpp/pull/951
15
+ - https://news.ycombinator.com/item?id=35148542
16
+ - https://arxiv.org/abs/2105.03536
17
+ - https://github.com/IST-DASLab/gptq
18
+
19
  **This model is a 8bit quantization of Vicuna 13B.**
20
  - 13B parameters
21
  - Group size: 128