spedrox-sac
/

Qwen2.5-1.5B_quantized_models

Text Generation

Model card Files Files and versions Community

Qwen2.5-1.5B_quantized_models / README.md

spedrox-sac's picture

Create README.md

60c2bf3 verified 18 days ago

|

history blame contribute delete

1.8 kB

	---
	license: mit
	datasets:
	- fka/awesome-chatgpt-prompts
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	pipeline_tag: text-generation
	---
	# Quantized Qwen2.5-1.5B-Instruct

	This repository contains 8-bit and 4-bit quantized versions of the Qwen2.5-1.5B-Instruct model using GPTQ. Quantization significantly reduces the model's size and memory footprint, enabling faster inference on resource-constrained devices while maintaining reasonable performance.


	## Model Description

	The Qwen2.5-1.5B-Instruct is a powerful language model developed by Qwen for instructional tasks. These quantized versions offer a more efficient way to deploy and utilize this model.


	## Quantization Details

	* Quantization Method: GPTQ (Generative Pretrained Transformer Quantization)
	* Quantization Bits: 8-bit and 4-bit versions available.
	* Dataset: The model was quantized using a subset of the "fka/awesome-chatgpt-prompts" dataset.


	## Usage

	To use the quantized models, follow these steps:

	Install Dependencies:
	```bash
	pip install transformers accelerate bitsandbytes auto-gptq optimum
	```
	## Performance

	The quantized models offer a significant reduction in size and memory usage compared to the original model. While there might be a slight decrease in performance, the trade-off is often beneficial for deployment on devices with limited resources.


	## Disclaimer

	These quantized models are provided for research and experimentation purposes. We do not guarantee their performance or suitability for specific applications.


	## Acknowledgements

	* Qwen: For developing the original Qwen2.5-1.5B-Instruct model.
	* Hugging Face: For providing the platform and tools for model sharing and quantization.
	* GPTQ Authors: For developing the GPTQ quantization method.