mobiuslabsgmbh
/

Llama-2-7b-chat-hf_1bitgs8_hqq

Text Generation

Model card Files Files and versions Community

appoose commited on Mar 27, 2024

Commit

20e9915

·

verified ·

1 Parent(s): 4b20f21

adding mascot

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -7,6 +7,8 @@ pipeline_tag: text-generation
 This is an experimental <a href="https://github.com/mobiusml/hqq/">HQQ</a> 1-bit quantized (<b>binary weights</b>) <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf"> Llama2-7B-chat model </a> using a LoRA adapter to improve the performance (referred to as HQQ+).
 Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
 We notice that, 1-bit quantization doesn't work well when applied directly on small models such as the Llama2-7B. However, when fine-tuned, the model's ouput significantly improves. In fact, the 1-bit base model outperforms Quip# 2-bit after fine-tuning on ~2.9K samples.

 This is an experimental <a href="https://github.com/mobiusml/hqq/">HQQ</a> 1-bit quantized (<b>binary weights</b>) <a href="https://huggingface.co/meta-llama/Llama-2-7b-chat-hf"> Llama2-7B-chat model </a> using a LoRA adapter to improve the performance (referred to as HQQ+).
+![image/gif](https://cdn-uploads.huggingface.co/production/uploads/636b945ef575d3705149e982/3fOfrg-5WtJwC5cpcVDub.gif)
 Quantizing small models at extreme low-bits is a challenging task. The purpose of this model is to show the community what to expect when fine-tuning such models.
 We notice that, 1-bit quantization doesn't work well when applied directly on small models such as the Llama2-7B. However, when fine-tuned, the model's ouput significantly improves. In fact, the 1-bit base model outperforms Quip# 2-bit after fine-tuning on ~2.9K samples.