elinas
/

vicuna-13b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

elinas commited on Apr 4, 2023

Commit

db2e311

•

1 Parent(s): ac5ec49

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ tags:
 # vicuna-13b-4bit
 Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
-This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda` or `triton`.
 Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/

 # vicuna-13b-4bit
 Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
+This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda` or `triton`. These implementations require a different format to use.
 Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/