TheBloke
/

airoboros-13B-gpt4-1.2-GPTQ

@@ -1,6 +1,8 @@
 ---
 inference: false
 license: other
 ---
 <!-- header start -->
@@ -27,7 +29,15 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GGML)
-* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2)
 ## How to easily download and use this model in text-generation-webui
@@ -58,7 +68,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
-model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order"
 use_triton = False
@@ -104,17 +114,15 @@ print(pipe(prompt_template)[0]['generated_text'])
 ## Provided files
-**airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
-It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
-* `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
-  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord

 ---
 inference: false
 license: other
+datasets:
+- jondurbin/airoboros-gpt4-1.2
 ---
 <!-- header start -->
 * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13B-gpt4-1.2-GGML)
+* [Unquantised fp32 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/jondurbin/airoboros-13b-gpt4-1.2)
+## Prompt template
+```
+A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input.
+USER: prompt
+ASSISTANT:
+```
 ## How to easily download and use this model in text-generation-webui
 import argparse
 model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
+model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act-order"
 use_triton = False
 ## Provided files
+**airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+* `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act-order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = 128. Act Order / desc_act = True.
 <!-- footer start -->
 ## Discord