Thireus
/

Vicuna13B-v1.1-8bit-128g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Thireus commited on Apr 16, 2023

Commit

a2a7f2a

•

1 Parent(s): fa7eec5

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 ![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
 Q. Why quantized in 8bit instead of 4bit?
-A. In theory, a 8bit quantized model should provide slightly better perplexity (maybe not noticeable - To Be Evaluated...) over a 4bit quatized version. If your available GPU VRAM is over 15GB you may want to try this out.
 Note that quatization in 8bit does not mean loading the model in 8bit precision. Loading your model in 8bit precision (--load-in-8bit) comes with noticeable quality (perplexity) degradation.
 Refs:

 ![demo](https://thireus.com/AI/Thireus_Vicuna13B-v1.1-8bit-128g_08.png)
 Q. Why quantized in 8bit instead of 4bit?
+A. For evaluation purpose. In theory, a 8bit quantized model should provide slightly better perplexity (maybe not noticeable - To Be Evaluated...) over a 4bit quatized version. If your available GPU VRAM is over 15GB you may want to try this out.
 Note that quatization in 8bit does not mean loading the model in 8bit precision. Loading your model in 8bit precision (--load-in-8bit) comes with noticeable quality (perplexity) degradation.
 Refs: