Quantized Qwen2.5-1.5B-Instruct
This repository contains 8-bit and 4-bit quantized versions of the Qwen2.5-1.5B-Instruct model using GPTQ. Quantization significantly reduces the model's size and memory footprint, enabling faster inference on resource-constrained devices while maintaining reasonable performance.
Model Description
The Qwen2.5-1.5B-Instruct is a powerful language model developed by Qwen for instructional tasks. These quantized versions offer a more efficient way to deploy and utilize this model.
Quantization Details
- Quantization Method: GPTQ (Generative Pretrained Transformer Quantization)
- Quantization Bits: 8-bit and 4-bit versions available.
- Dataset: The model was quantized using a subset of the "fka/awesome-chatgpt-prompts" dataset.
Usage
To use the quantized models, follow these steps:
Install Dependencies:
pip install transformers accelerate bitsandbytes auto-gptq optimum
Performance
The quantized models offer a significant reduction in size and memory usage compared to the original model. While there might be a slight decrease in performance, the trade-off is often beneficial for deployment on devices with limited resources.
Disclaimer
These quantized models are provided for research and experimentation purposes. We do not guarantee their performance or suitability for specific applications.
Acknowledgements
- Qwen: For developing the original Qwen2.5-1.5B-Instruct model.
- Hugging Face: For providing the platform and tools for model sharing and quantization.
- GPTQ Authors: For developing the GPTQ quantization method.