Mixtral 8x7B Instruct-v0.1 - bitsandbytes 4-bit

This repository contains the bitsandbytes 4-bit quantized version of mistralai/Mixtral-8x7B-Instruct-v0.1. To use it, make sure to have the latest version of bitsandbytes and transformers installed from source:

Loading this model as such: will directly load the quantized model in 4-bit precision.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(model_id)

Note you need a CUDA-compatible GPU device to run low-bit precision models with bitsandbytes

Downloads last month
74
Safetensors
Model size
24.2B params
Tensor type
F32
FP16
U8
Inference Examples
Inference API (serverless) has been turned off for this model.