mobicham commited on
Commit
f00739d
1 Parent(s): 4978959

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -1
README.md CHANGED
@@ -1 +1,29 @@
1
- hello
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama2
3
+ train: false
4
+ inference: false
5
+ pipeline_tag: text-generation
6
+ ---
7
+
8
+ ## Llama-2-7b-hf-4bit_g64-HQQ
9
+ This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
10
+
11
+ To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and load it as follows:
12
+ ``` Python
13
+ from hqq.models.llama import LlamaHQQ
14
+ import transformers
15
+
16
+ model_id = 'mobiuslabsgmbh/Llama-2-7b-hf-4bit_g64-HQQ'
17
+ #Load the tokenizer
18
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
19
+ #Load the model
20
+ model = LlamaHQQ.from_quantized(model_id)
21
+ ```
22
+
23
+ You can then use the model for text generation or to reproduce the benchmark numbers.
24
+
25
+ *Note*: this model is not compatible with HuggingFace's PEFT.
26
+
27
+
28
+
29
+