TheBloke commited on
Commit
cfc50d0
·
1 Parent(s): d544fa1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -41,8 +41,8 @@ GGML versions are not yet provided, as there is not yet support for SuperHOT in
41
 
42
  ## Repositories available
43
 
44
- * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Vicuna-13B-1.3.0-SuperHOT-8K-GPTQ)
45
- * [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/Vicuna-13B-1.3.0-SuperHOT-8K-fp16)
46
  * [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-13b-v1.3)
47
 
48
  ## How to easily download and use this model in text-generation-webui with ExLlama
@@ -50,12 +50,12 @@ GGML versions are not yet provided, as there is not yet support for SuperHOT in
50
  Please make sure you're using the latest version of text-generation-webui
51
 
52
  1. Click the **Model tab**.
53
- 2. Under **Download custom model or LoRA**, enter `TheBloke/Vicuna-13B-1.3.0-SuperHOT-8K-GPTQ`.
54
  3. Click **Download**.
55
  4. The model will start downloading. Once it's finished it will say "Done"
56
  5. Untick **Autoload the model**
57
  6. In the top left, click the refresh icon next to **Model**.
58
- 7. In the **Model** dropdown, choose the model you just downloaded: `Vicuna-13B-1.3.0-SuperHOT-8K-GPTQ`
59
  8. To use the increased context, set the **Loader** to **ExLlama**, set **max_seq_len** to 8192 or 4096, and set **compress_pos_emb** to **4** for 8192 context, or to **2** for 4096 context.
60
  9. Now click **Save Settings** followed by **Reload**
61
  10. The model will automatically load, and is now ready for use!
@@ -78,7 +78,7 @@ from transformers import AutoTokenizer, pipeline, logging
78
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
79
  import argparse
80
 
81
- model_name_or_path = "TheBloke/Vicuna-13B-1.3.0-SuperHOT-8K-GPTQ"
82
  model_basename = "vicuna-13b-1.3.0-superhot-8k-GPTQ-4bit-128g.no-act.order"
83
 
84
  use_triton = False
 
41
 
42
  ## Repositories available
43
 
44
+ * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ)
45
+ * [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/Vicuna-13B-1-3-SuperHOT-8K-fp16)
46
  * [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/lmsys/vicuna-13b-v1.3)
47
 
48
  ## How to easily download and use this model in text-generation-webui with ExLlama
 
50
  Please make sure you're using the latest version of text-generation-webui
51
 
52
  1. Click the **Model tab**.
53
+ 2. Under **Download custom model or LoRA**, enter `TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ`.
54
  3. Click **Download**.
55
  4. The model will start downloading. Once it's finished it will say "Done"
56
  5. Untick **Autoload the model**
57
  6. In the top left, click the refresh icon next to **Model**.
58
+ 7. In the **Model** dropdown, choose the model you just downloaded: `Vicuna-13B-1-3-SuperHOT-8K-GPTQ`
59
  8. To use the increased context, set the **Loader** to **ExLlama**, set **max_seq_len** to 8192 or 4096, and set **compress_pos_emb** to **4** for 8192 context, or to **2** for 4096 context.
60
  9. Now click **Save Settings** followed by **Reload**
61
  10. The model will automatically load, and is now ready for use!
 
78
  from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
79
  import argparse
80
 
81
+ model_name_or_path = "TheBloke/Vicuna-13B-1-3-SuperHOT-8K-GPTQ"
82
  model_basename = "vicuna-13b-1.3.0-superhot-8k-GPTQ-4bit-128g.no-act.order"
83
 
84
  use_triton = False