Quantized DeepSeek-R1-Distill-Qwen-1.5B

This is a 8-bit quantized version of the DeepSeek-R1-Distill-Qwen-1.5B model using bitsandbytes quantization.

Model Details

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Quantization: 8-bit (NF4)
Library: bitsandbytes
Framework: transformers
Use Case: Text generation, chatbot applications, and other NLP tasks.

How to Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit"

# INT8 Config
bnb_config_8bit = BitsAndBytesConfig(
    load_in_8bit=True,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_8bit)
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(
        'text-generation',
        model=model,
        tokenizer=tokenizer,
        max_length=1024,
        truncation=True,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
    )

messages = [
    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)


from transformers import pipeline

pipe = pipeline("text-generation", model="Deepak7376/DeepSeek-R1-Distill-Qwen-1.5B-bnb-8bit")

messages = [
    {"role": "user", "content": "suggest me top movies in 2021? <think>\n"},
]
pipe(messages)

Model Performance

Quantizing the model significantly reduces memory usage while maintaining good performance. Here are the memory footprints:

Model Version	Memory Usage
Base Model	~3.6GB
8-bit Quantized	~2.25GB

License

This model follows the apache-2.0 license.

Acknowledgments

DeepSeek-AI for the original model.
BitsAndBytes for quantization support.