mobiuslabsgmbh
/

Llama-2-7b-hf-4bit_g64-HQQ

Text Generation

Model card Files Files and versions Community

mobicham commited on Nov 20, 2023

Commit

f00739d

•

1 Parent(s): 4978959

Update README.md

Files changed (1) hide show

README.md +29 -1

README.md CHANGED Viewed

	@@ -1 +1,29 @@
1	- ~~hello~~

+---
+license: llama2
+train: false
+inference: false
+pipeline_tag: text-generation
+---
+## Llama-2-7b-hf-4bit_g64-HQQ
+This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
+To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and load it as follows:
+``` Python
+from hqq.models.llama  import LlamaHQQ
+import transformers
+model_id = 'mobiuslabsgmbh/Llama-2-7b-hf-4bit_g64-HQQ'
+#Load the tokenizer
+tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
+#Load the model
+model = LlamaHQQ.from_quantized(model_id)
+```
+You can then use the model for text generation or to reproduce the benchmark numbers.
+*Note*: this model is not compatible with HuggingFace's PEFT.