File size: 935 Bytes
0c02dd3
 
 
 
25bbb90
 
 
 
 
0c02dd3
25bbb90
0c02dd3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- mistral
- mixtral
- moe
model_name: Mixtral 8X7B - bnb 4-bit
inference: false
model_type: mixtral
pipeline_tag: text-generation
quantized_by: ybelkada
---

# Mixtral 8x7B Instruct-v0.1 - `bitsandbytes` 4-bit 

This repository contains the bitsandbytes 4-bit quantized version of [`mistralai/Mixtral-8x7B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1). To use it, make sure to have the latest version of `bitsandbytes` and `transformers` installed from source:

Loading this model as such: will directly load the quantized model in 4-bit precision.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit"
model = AutoModelForCausalLM.from_pretrained(model_id)
```
Note you need a CUDA-compatible GPU device to run low-bit precision models with `bitsandbytes`