|
--- |
|
license: mit |
|
datasets: |
|
- fka/awesome-chatgpt-prompts |
|
language: |
|
- en |
|
base_model: |
|
- Qwen/Qwen2.5-1.5B-Instruct |
|
pipeline_tag: text-generation |
|
--- |
|
# Quantized Qwen2.5-1.5B-Instruct |
|
|
|
This repository contains 8-bit and 4-bit quantized versions of the Qwen2.5-1.5B-Instruct model using GPTQ. Quantization significantly reduces the model's size and memory footprint, enabling faster inference on resource-constrained devices while maintaining reasonable performance. |
|
|
|
|
|
## Model Description |
|
|
|
The Qwen2.5-1.5B-Instruct is a powerful language model developed by Qwen for instructional tasks. These quantized versions offer a more efficient way to deploy and utilize this model. |
|
|
|
|
|
## Quantization Details |
|
|
|
* **Quantization Method:** GPTQ (Generative Pretrained Transformer Quantization) |
|
* **Quantization Bits:** 8-bit and 4-bit versions available. |
|
* **Dataset:** The model was quantized using a subset of the "fka/awesome-chatgpt-prompts" dataset. |
|
|
|
|
|
## Usage |
|
|
|
To use the quantized models, follow these steps: |
|
|
|
**Install Dependencies:** |
|
```bash |
|
pip install transformers accelerate bitsandbytes auto-gptq optimum |
|
``` |
|
## Performance |
|
|
|
The quantized models offer a significant reduction in size and memory usage compared to the original model. While there might be a slight decrease in performance, the trade-off is often beneficial for deployment on devices with limited resources. |
|
|
|
|
|
## Disclaimer |
|
|
|
These quantized models are provided for research and experimentation purposes. We do not guarantee their performance or suitability for specific applications. |
|
|
|
|
|
## Acknowledgements |
|
|
|
* **Qwen:** For developing the original Qwen2.5-1.5B-Instruct model. |
|
* **Hugging Face:** For providing the platform and tools for model sharing and quantization. |
|
* **GPTQ Authors:** For developing the GPTQ quantization method. |