mobicham commited on
Commit
620d042
1 Parent(s): 70eff32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -8,7 +8,7 @@ pipeline_tag: text-generation
8
  ## Llama-2-7b-hf-4bit_g64-HQQ
9
  This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
10
 
11
- To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and load it as follows:
12
  ``` Python
13
  from hqq.models.llama import LlamaHQQ
14
  import transformers
@@ -20,8 +20,6 @@ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
20
  model = LlamaHQQ.from_quantized(model_id)
21
  ```
22
 
23
- You can then use the model for text generation or to reproduce the benchmark numbers.
24
-
25
  *Limitations*: <br>
26
  -Only supports a single GPU runtime.<br>
27
  -Not compatible with HuggingFace's PEFT.<br>
 
8
  ## Llama-2-7b-hf-4bit_g64-HQQ
9
  This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
10
 
11
+ To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and use it as follows:
12
  ``` Python
13
  from hqq.models.llama import LlamaHQQ
14
  import transformers
 
20
  model = LlamaHQQ.from_quantized(model_id)
21
  ```
22
 
 
 
23
  *Limitations*: <br>
24
  -Only supports a single GPU runtime.<br>
25
  -Not compatible with HuggingFace's PEFT.<br>