TheBloke commited on
Commit
5a653c4
1 Parent(s): 84bb01a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  inference: false
3
  license: other
 
 
4
  ---
5
 
6
  <!-- header start -->
@@ -27,7 +29,15 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
27
 
28
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GPTQ)
29
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GGML)
30
- * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2)
 
 
 
 
 
 
 
 
31
 
32
  ## How to easily download and use this model in text-generation-webui
33
 
@@ -58,7 +68,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
58
  import argparse
59
 
60
  model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
61
- model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order"
62
 
63
  use_triton = False
64
 
@@ -104,17 +114,15 @@ print(pipe(prompt_template)[0]['generated_text'])
104
 
105
  ## Provided files
106
 
107
- **airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors**
108
 
109
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
110
 
111
- It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
112
-
113
- * `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors`
114
  * Works with AutoGPTQ in CUDA or Triton modes.
115
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
116
  * Works with text-generation-webui, including one-click-installers.
117
- * Parameters: Groupsize = 128. Act Order / desc_act = False.
118
 
119
  <!-- footer start -->
120
  ## Discord
 
1
  ---
2
  inference: false
3
  license: other
4
+ datasets:
5
+ - jondurbin/airoboros-gpt4-1.2
6
  ---
7
 
8
  <!-- header start -->
 
29
 
30
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GPTQ)
31
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GGML)
32
+ * [Unquantised fp32 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2)
33
+
34
+ ## Prompt template
35
+
36
+ ```
37
+ A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input.
38
+ USER: prompt
39
+ ASSISTANT:
40
+ ```
41
 
42
  ## How to easily download and use this model in text-generation-webui
43
 
 
68
  import argparse
69
 
70
  model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
71
+ model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act-order"
72
 
73
  use_triton = False
74
 
 
114
 
115
  ## Provided files
116
 
117
+ **airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.act.order.safetensors**
118
 
119
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
120
 
121
+ * `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act-order.safetensors`
 
 
122
  * Works with AutoGPTQ in CUDA or Triton modes.
123
  * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
124
  * Works with text-generation-webui, including one-click-installers.
125
+ * Parameters: Groupsize = 128. Act Order / desc_act = True.
126
 
127
  <!-- footer start -->
128
  ## Discord