Update README.md
Browse files
README.md
CHANGED
@@ -8,20 +8,14 @@ tags:
|
|
8 |
|
9 |
# vicuna-13b-4bit
|
10 |
Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
|
11 |
-
This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda`
|
12 |
|
13 |
Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/
|
14 |
|
15 |
-
|
|
|
16 |
|
17 |
-
|
18 |
-
Recent GPTQ commits have introduced breaking changes to model loading and you should use commit `a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773` in the `cuda` branch.
|
19 |
-
|
20 |
-
If you're not familiar with the Git process
|
21 |
-
1. `git checkout a6f363e3f93b9fb5c26064b5ac7ed58d22e3f773`
|
22 |
-
2. `git switch -c cuda-stable`
|
23 |
-
|
24 |
-
This creates and switches to a `cuda-stable` branch to continue using the quantized models.
|
25 |
|
26 |
# Usage
|
27 |
1. Run manually through GPTQ
|
|
|
8 |
|
9 |
# vicuna-13b-4bit
|
10 |
Converted `vicuna-13b` to GPTQ 4bit using `true-sequentual` and `groupsize 128` in `safetensors` for best possible model performance.
|
11 |
+
This does **not** support `llama.cpp` or any other cpp implemetations, only `cuda` is supported. These implementations require a different format to use.
|
12 |
|
13 |
Vicuna is a high coherence model based on Llama that is comparable to ChatGPT. Read more here https://vicuna.lmsys.org/
|
14 |
|
15 |
+
# Important - Update 2023-04-05
|
16 |
+
Recent GPTQ commits have introduced breaking changes to model loading and you should this fork for a stable experience https://github.com/oobabooga/GPTQ-for-LLaMa
|
17 |
|
18 |
+
Curently only cuda is supported.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
# Usage
|
21 |
1. Run manually through GPTQ
|