Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
|
8 |
|
9 |
Q. Why quantized in 8bit instead of 4bit?
|
10 |
-
A. In theory, a 8bit quantized model should provide slightly better perplexity (maybe not noticeable - To Be Evaluated...) over a 4bit quatized version. If your available GPU VRAM is over 15GB you may want to try this out.
|
11 |
Note that quatization in 8bit does not mean loading the model in 8bit precision. Loading your model in 8bit precision (--load-in-8bit) comes with noticeable quality (perplexity) degradation.
|
12 |
|
13 |
Refs:
|
|
|
7 |
![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
|
8 |
|
9 |
Q. Why quantized in 8bit instead of 4bit?
|
10 |
+
A. For evaluation purpose. In theory, a 8bit quantized model should provide slightly better perplexity (maybe not noticeable - To Be Evaluated...) over a 4bit quatized version. If your available GPU VRAM is over 15GB you may want to try this out.
|
11 |
Note that quatization in 8bit does not mean loading the model in 8bit precision. Loading your model in 8bit precision (--load-in-8bit) comes with noticeable quality (perplexity) degradation.
|
12 |
|
13 |
Refs:
|