Quantized DeepSeek-R1-Distill-Qwen-1.5B

Model Preview

This is a 8-bit quantized version of the DeepSeek-R1-Distill-Qwen-1.5B model using bitsandbytes quantization.

Model Details

  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Quantization: 8-bit (NF4)
  • Library: bitsandbytes
  • Framework: transformers
  • Use Case: Text generation, chatbot applications, and other NLP tasks.

How to Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit"

# INT8 Config
bnb_config_8bit = BitsAndBytesConfig(
    load_in_8bit=True,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_8bit)
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(
        'text-generation',
        model=model,
        tokenizer=tokenizer,
        max_length=1024,
        truncation=True,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
    )

messages = [
    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)

or


from transformers import pipeline

pipe = pipeline("text-generation", model="Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit")

messages = [
    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)

Model Performance

Quantizing the model significantly reduces memory usage while maintaining good performance. Here are the memory footprints:

Model Version Memory Usage
Base Model ~3.6GB
8-bit Quantized ~2.25GB

License

This model follows the apache-2.0 license.

Acknowledgments

Downloads last month
8
Safetensors
Model size
1.78B params
Tensor type
F32
·
FP16
·
I8
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.