Quantized DeepSeek-R1-Distill-Qwen-1.5B
This is a 8-bit quantized version of the DeepSeek-R1-Distill-Qwen-1.5B model using bitsandbytes
quantization.
Model Details
- Base Model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Quantization: 8-bit (
NF4
) - Library: bitsandbytes
- Framework:
transformers
- Use Case: Text generation, chatbot applications, and other NLP tasks.
How to Load the Model
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
model_id = "Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit"
# INT8 Config
bnb_config_8bit = BitsAndBytesConfig(
load_in_8bit=True,
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_8bit)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline(
'text-generation',
model=model,
tokenizer=tokenizer,
max_length=1024,
truncation=True,
do_sample=True,
temperature=0.6,
top_p=0.95,
)
messages = [
{"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)
or
from transformers import pipeline
pipe = pipeline("text-generation", model="Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit")
messages = [
{"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)
Model Performance
Quantizing the model significantly reduces memory usage while maintaining good performance. Here are the memory footprints:
Model Version | Memory Usage |
---|---|
Base Model | ~3.6GB |
8-bit Quantized | ~2.25GB |
License
This model follows the apache-2.0
license.
Acknowledgments
- DeepSeek-AI for the original model.
- BitsAndBytes for quantization support.
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.