Update README.md
Browse files
README.md
CHANGED
@@ -18,19 +18,33 @@ This repo contains 4bit GPTQ models for GPU inference, quantised using [GPTQ-for
|
|
18 |
* [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
|
19 |
* [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
22 |
|
23 |
-
Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
24 |
|
25 |
-
If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.no-act-order.safetensors`.
|
26 |
|
27 |
## Provided files
|
28 |
|
29 |
-
Two files are provided. **The
|
30 |
|
31 |
-
Specifically, the
|
32 |
|
33 |
-
|
34 |
|
35 |
* `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
|
36 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
@@ -50,7 +64,7 @@ Unless you are able to use the latest GPTQ-for-LLaMa code, please use `wizardLM-
|
|
50 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
|
51 |
```
|
52 |
|
53 |
-
## How to
|
54 |
|
55 |
File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
56 |
|
|
|
18 |
* [4bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizardLM-7B-GGML)
|
19 |
* [Unquantised model in HF format](https://huggingface.co/TheBloke/wizardLM-7B-HF)
|
20 |
|
21 |
+
## How to easily download and use this model in text-generation-webui
|
22 |
+
|
23 |
+
Load text-generation-webui as you normally do.
|
24 |
+
|
25 |
+
1. Click the **Model tab**.
|
26 |
+
2. Under **Download custom model or LoRA**, enter this repo name: `TheBloke/wizardLM-7B-GPTQ`.
|
27 |
+
3. Click **Download**.
|
28 |
+
4. Wait until it says it's finished downloading.
|
29 |
+
5. As this is a GPTQ model, fill in the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 128`, `model_type = Llama`
|
30 |
+
6. Now click the **Refresh** icon next to **Model** in the top left.
|
31 |
+
7. In the **Model drop-down**: choose this model: `wizardLM-7B-GPTQ`.
|
32 |
+
8. Click **Reload the Model** in the top right.
|
33 |
+
9. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
34 |
+
|
35 |
## GIBBERISH OUTPUT IN `text-generation-webui`?
|
36 |
|
37 |
+
Please read the Provided Files section below. You should use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` unless you are able to use the latest GPTQ-for-LLaMa code.
|
38 |
|
39 |
+
If you're using a text-generation-webui one click installer, you MUST use `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`.
|
40 |
|
41 |
## Provided files
|
42 |
|
43 |
+
Two files are provided. **The 'latest' file will not work unless you use a recent version of GPTQ-for-LLaMa**
|
44 |
|
45 |
+
Specifically, the 'latest' file uses `--act-order` for maximum quantisation quality and will not work with oobabooga's fork of GPTQ-for-LLaMa. Therefore at this time it will also not work with `text-generation-webui` one-click installers.
|
46 |
|
47 |
+
The 'compat' file will be used by default in text-generation-webui so you don't need to do anything special to use it. If you want to use the 'latest' file, please remove the 'cmopat' file - but only do this if you are able to use the latest GPTQ-for-LLaMa code.
|
48 |
|
49 |
* `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors`
|
50 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
|
|
64 |
CUDA_VISIBLE_DEVICES=0 python3 llama.py wizardLM-7B-HF c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors wizardLM-7B-GPTQ-4bit-128g.act-order.safetensors
|
65 |
```
|
66 |
|
67 |
+
## How to install manually in `text-generation-webui` and update GPTQ-for-LLaMa if necessary
|
68 |
|
69 |
File `wizardLM-7B-GPTQ-4bit-128g.compat.no-act-order.safetensors` can be loaded the same as any other GPTQ file, without requiring any updates to [oobaboogas text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
70 |
|