neuralmagic
/

Mistral-7B-Instruct-v0.3-GPTQ-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

mgoin commited on May 23

Commit

ca4faba

•

1 Parent(s): 2ca503f

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -24,4 +24,10 @@ base_model: mistralai/Mistral-7B-Instruct-v0.3
 This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/3bX2Hqj4LaJxFhPHRucAn.png)

 This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm
+Simply start this model as an inference server with:
+```bash
+python -m vllm.entrypoints.openai.api_server --model neuralmagic/Mistral-7B-Instruct-v0.3-GPTQ-4bit
+```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/SC_tYXjoS3yIoOYtfqZ2E.png)