defenseunicorns
/

Hermes-2-Pro-Mistral-7B-4bit-32g-GPTQ

Text Generation

function calling

4-bit precision

Model card Files Files and versions Community

justinthelaw commited on Aug 2, 2024

Commit

92e11c4

·

verified ·

1 Parent(s): 1727868

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -39,9 +39,11 @@ This repo contains GPTQ 4-bit, 32g Group Size, quantized model files from the No
 Models are released as sharded safetensors files.
-| Bits | GS | GPTQ Dataset | Seq Len | Size |
-| ---- | -- | ----------- | ------- | ---- |
-| 4 | 32 | [VMWare Open Instruct](https://huggingface.co/datasets/vmware/open-instruct) | 1,024  | 4.57 GB
 <!-- README_GPTQ.md-provided-files end -->

 Models are released as sharded safetensors files.
+| Bits | GS | GPTQ Dataset | Max Seq Len | Size | VRAM |
+| ---- | -- | ----------- | ------- | ---- | ---- |
+| 4 | 32 | [VMWare Open Instruct](https://huggingface.co/datasets/vmware/open-instruct) | 32,768  | 4.57 GB | 19-23 Gb*
+* Depends on maximum sequence length parameter (KV cache utilization) used with vLLM or Transformers
 <!-- README_GPTQ.md-provided-files end -->