TheBloke commited on
Commit
57a555e
1 Parent(s): 88776ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -24
README.md CHANGED
@@ -53,11 +53,42 @@ quantized_by: TheBloke
53
  - Model creator: [Disco Research](https://huggingface.co/DiscoResearch)
54
  - Original model: [Discolm Mixtral 8X7B v2](https://huggingface.co/DiscoResearch/DiscoLM-mixtral-8x7b-v2)
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  <!-- description start -->
57
  # Description
58
 
59
  This repo contains GPTQ model files for [Disco Research's Discolm Mixtral 8X7B v2](https://huggingface.co/DiscoResearch/DiscoLM-mixtral-8x7b-v2).
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
62
 
63
  <!-- description end -->
@@ -83,22 +114,6 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
83
  <!-- prompt-template end -->
84
 
85
 
86
-
87
- <!-- README_GPTQ.md-compatible clients start -->
88
- ## Known compatible clients / servers
89
-
90
- GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
91
-
92
- These GPTQ models are known to work in the following inference servers/webuis.
93
-
94
- - [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
95
- - [KoboldAI United](https://github.com/henk717/koboldai)
96
- - [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
97
- - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
98
-
99
- This may not be a complete list; if you know of others, please let me know!
100
- <!-- README_GPTQ.md-compatible clients end -->
101
-
102
  <!-- README_GPTQ.md-provided-files start -->
103
  ## Provided files, and GPTQ parameters
104
 
@@ -124,8 +139,8 @@ Most GPTQ files are made with AutoGPTQ. Mistral models are currently made with T
124
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
125
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
126
  | [main](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.97 GB | Yes | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
127
- | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
128
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
129
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.98 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
130
  | [gptq-3bit-128g-actorder_true](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit-128g-actorder_true) | 3 | 128 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. |
131
  | [gptq-3bit-32g-actorder_true](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit-32g-actorder_true) | 3 | 32 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.99 GB | No | 3-bit, with group size 64g and act-order. Highest quality 3-bit option. |
@@ -204,6 +219,8 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
204
  <!-- README_GPTQ.md-text-generation-webui start -->
205
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
206
 
 
 
207
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
208
 
209
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
@@ -230,6 +247,8 @@ It is strongly recommended to use the text-generation-webui one-click-installers
230
  <!-- README_GPTQ.md-use-from-tgi start -->
231
  ## Serving this model from Text Generation Inference (TGI)
232
 
 
 
233
  It's recommended to use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
234
 
235
  Example Docker parameters:
@@ -272,6 +291,8 @@ print(f"Model output: {response}")
272
  <!-- README_GPTQ.md-use-from-python start -->
273
  ## Python code example: inference from this GPTQ model
274
 
 
 
275
  ### Install the necessary packages
276
 
277
  Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
@@ -345,11 +366,8 @@ print(pipe(prompt_template)[0]['generated_text'])
345
  <!-- README_GPTQ.md-compatibility start -->
346
  ## Compatibility
347
 
348
- The files provided are tested to work with Transformers. For non-Mistral models, AutoGPTQ can also be used directly.
349
-
350
- [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama and Mistral models in 4-bit. Please see the Provided Files table above for per-file compatibility.
351
 
352
- For a list of clients/servers, please see "Known compatible clients / servers", above.
353
  <!-- README_GPTQ.md-compatibility end -->
354
 
355
  <!-- footer start -->
@@ -362,8 +380,6 @@ For further support, and discussions on these models and AI in general, join us
362
 
363
  ## Thanks, and how to contribute
364
 
365
- Thanks to the [chirper.ai](https://chirper.ai) team!
366
-
367
  Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
368
 
369
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
 
53
  - Model creator: [Disco Research](https://huggingface.co/DiscoResearch)
54
  - Original model: [Discolm Mixtral 8X7B v2](https://huggingface.co/DiscoResearch/DiscoLM-mixtral-8x7b-v2)
55
 
56
+ # WARNING - I CAN'T GET THESE GPTQ QUANTS TO WORK
57
+
58
+ Unfortunately, after 10 hours quanting at not insignificant cost, they don't actually appear to work.
59
+
60
+ I will leave them up in case any solution presents itself soon. But for now, I get errors like this
61
+
62
+ ```
63
+ File "/workspace/venv/pytorch2/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 239, in forward
64
+ zeros = zeros.reshape(-1, 1, zeros.shape[1] * zeros.shape[2])
65
+ RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 1, 0] because the unspecified dimension size -1 can be any value and is ambiguous
66
+
67
+ File "/workspace/venv/pytorch2/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda.py", line 245, in forward
68
+ zeros = zeros.reshape(self.scales.shape)
69
+ RuntimeError: shape '[32, 8]' is invalid for input of size 0
70
+ ```
71
+
72
  <!-- description start -->
73
  # Description
74
 
75
  This repo contains GPTQ model files for [Disco Research's Discolm Mixtral 8X7B v2](https://huggingface.co/DiscoResearch/DiscoLM-mixtral-8x7b-v2).
76
 
77
+ **Experimental model**
78
+
79
+ This is an experimental GPTQ of MistralAI's Mixtral 7B 8Expert.
80
+
81
+ This is a quantisation of an unofficial implementation of Mixtral 7B 8Experted, created and hosted by DiscoResearch at: [DiscoResearch/mixtral-7b-8expert](https://huggingface.co/DiscoResearch/mixtral-7b-8expert).
82
+
83
+ To use it requires:
84
+ * Latest Transformers, installed from Github:
85
+ ```
86
+ pip3 install git+https://github.com/huggingface/transformers.git
87
+ ```
88
+ * `trust_remote_code=True`
89
+
90
+ Note that I have not yet tested the model myself, I will update when I know VRAM requirements.
91
+
92
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
93
 
94
  <!-- description end -->
 
114
  <!-- prompt-template end -->
115
 
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  <!-- README_GPTQ.md-provided-files start -->
118
  ## Provided files, and GPTQ parameters
119
 
 
139
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
140
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
141
  | [main](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/main) | 4 | None | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.97 GB | Yes | 4-bit, with Act Order. No group size, to lower VRAM requirements. |
142
+ | [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | No | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
143
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | No | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
144
  | [gptq-3bit--1g-actorder_True](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit--1g-actorder_True) | 3 | None | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.98 GB | No | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
145
  | [gptq-3bit-128g-actorder_true](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit-128g-actorder_true) | 3 | 128 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 5.00 GB | No | 3-bit, with group size 128g and act-order. Higher quality than 128g-False. |
146
  | [gptq-3bit-32g-actorder_true](https://huggingface.co/TheBloke/DiscoLM-mixtral-8x7b-v2-GPTQ/tree/gptq-3bit-32g-actorder_true) | 3 | 32 | Yes | 0.1 | [VMware Open Instruct](https://huggingface.co/datasets/VMware/open-instruct/viewer/) | 4096 | 4.99 GB | No | 3-bit, with group size 64g and act-order. Highest quality 3-bit option. |
 
219
  <!-- README_GPTQ.md-text-generation-webui start -->
220
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
221
 
222
+ **NOTE** This likely doesn't work at the moment.
223
+
224
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
225
 
226
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 
247
  <!-- README_GPTQ.md-use-from-tgi start -->
248
  ## Serving this model from Text Generation Inference (TGI)
249
 
250
+ **NOTE** This likely doesn't work at the moment.
251
+
252
  It's recommended to use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
253
 
254
  Example Docker parameters:
 
291
  <!-- README_GPTQ.md-use-from-python start -->
292
  ## Python code example: inference from this GPTQ model
293
 
294
+ **NOTE** I can't get this working yet.
295
+
296
  ### Install the necessary packages
297
 
298
  Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
 
366
  <!-- README_GPTQ.md-compatibility start -->
367
  ## Compatibility
368
 
369
+ These GPTQs are not yet working.
 
 
370
 
 
371
  <!-- README_GPTQ.md-compatibility end -->
372
 
373
  <!-- footer start -->
 
380
 
381
  ## Thanks, and how to contribute
382
 
 
 
383
  Thanks to Clay from [gpus.llm-utils.org](llm-utils)!
384
 
385
  I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.