tsumeone
/

stable-vicuna-13B-4bit-128g-cuda

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tsumeone commited on Apr 30, 2023

Commit

9fd74ab

·

1 Parent(s): b8fa3e9

Create README.md

Files changed (1) hide show

README.md +8 -0

README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+Quantized version of this: https://huggingface.co/TheBloke/stable-vicuna-13B-HF
+Big thank you to TheBloke for uploading the HF version above.  Unfortunately, his GPTQ quant doesn't run on 0cc4m's fork of KAI/GPTQ so I am uploading one that does.
+GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI.
+Command used to quantize:
+```python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors```