mobiuslabsgmbh
/

Llama-2-7b-hf-4bit_g64-HQQ

Text Generation

Model card Files Files and versions Community

mobicham commited on Nov 20, 2023

Commit

620d042

•

1 Parent(s): 70eff32

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ pipeline_tag: text-generation
 ## Llama-2-7b-hf-4bit_g64-HQQ
 This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
-To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and load it as follows:
 ``` Python
 from hqq.models.llama import LlamaHQQ
 import transformers
@@ -20,8 +20,6 @@ tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
 model = LlamaHQQ.from_quantized(model_id)
 ```
-You can then use the model for text generation or to reproduce the benchmark numbers.
 *Limitations*: <br>
 -Only supports a single GPU runtime.<br>
 -Not compatible with HuggingFace's PEFT.<br>

 ## Llama-2-7b-hf-4bit_g64-HQQ
 This a version of the LLama2-7B model quantized to 4-bit via Half-Quadratic Quantization (HQQ): https://mobiusml.github.io/hqq/
+To run the model, install the HQQ library from https://github.com/mobiusml/hqq/tree/main/code and use it as follows:
 ``` Python
 from hqq.models.llama import LlamaHQQ
 import transformers
 model = LlamaHQQ.from_quantized(model_id)
 ```
 *Limitations*: <br>
 -Only supports a single GPU runtime.<br>
 -Not compatible with HuggingFace's PEFT.<br>