TheBloke
/

mixtral-8x7b-v0.1-AWQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Dec 22, 2023

Commit

cc28d08

·

1 Parent(s): 1227a9c

Upload README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -45,13 +45,23 @@ quantized_by: TheBloke
 This repo contains AWQ model files for [Mistral AI_'s Mixtral 8X7B v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
 ### About AWQ
 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
-It is supported by:
 - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
 - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.

 This repo contains AWQ model files for [Mistral AI_'s Mixtral 8X7B v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1).
+**MIXTRAL AWQ**
+This is a Mixtral AWQ model.
+For AutoAWQ inference, please install AutoAWQ from source.
+Support via Transformers is coming soon, via this PR: https://github.com/huggingface/transformers/pull/27950 which should be merged to Transformers `main` very soon.
+Support via vLLM and TGI has not yet been confirmed.
 ### About AWQ
 AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference with equivalent or better quality compared to the most commonly used GPTQ settings.
 AWQ models are currently supported on Linux and Windows, with NVidia GPUs only. macOS users: please use GGUF models instead.
+AWQ models are supported by (note that not all of these may support Mixtral models yet):
 - [Text Generation Webui](https://github.com/oobabooga/text-generation-webui) - using Loader: AutoAWQ
 - [vLLM](https://github.com/vllm-project/vllm) - version 0.2.2 or later for support for all model types.