Initial GPTQ model commit
Browse files
README.md
CHANGED
@@ -58,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
|
58 |
import argparse
|
59 |
|
60 |
model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
|
61 |
-
model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.act.order"
|
62 |
|
63 |
use_triton = False
|
64 |
|
@@ -104,17 +104,17 @@ print(pipe(prompt_template)[0]['generated_text'])
|
|
104 |
|
105 |
## Provided files
|
106 |
|
107 |
-
**airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.act.order.safetensors**
|
108 |
|
109 |
This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
110 |
|
|
|
111 |
|
112 |
-
|
113 |
-
* `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.act.order.safetensors`
|
114 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
115 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
116 |
* Works with text-generation-webui, including one-click-installers.
|
117 |
-
* Parameters: Groupsize = 128. Act Order / desc_act =
|
118 |
|
119 |
<!-- footer start -->
|
120 |
## Discord
|
|
|
58 |
import argparse
|
59 |
|
60 |
model_name_or_path = "TheBloke/airoboros-13B-gpt4-1.2-GPTQ"
|
61 |
+
model_basename = "airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order"
|
62 |
|
63 |
use_triton = False
|
64 |
|
|
|
104 |
|
105 |
## Provided files
|
106 |
|
107 |
+
**airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors**
|
108 |
|
109 |
This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
|
110 |
|
111 |
+
It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
|
112 |
|
113 |
+
* `airoboros-13b-gpt4-1.2-GPTQ-4bit-128g.no-act.order.safetensors`
|
|
|
114 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
115 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
116 |
* Works with text-generation-webui, including one-click-installers.
|
117 |
+
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|
118 |
|
119 |
<!-- footer start -->
|
120 |
## Discord
|