This is a quantized model of Mistral-7B-Instruct-v0.3 using GPTQ developed by IST Austria using the following configuration:

  • 4bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 67.65 67.72 66.95
ARC 64.2 64.1 62.1
Hellaswag 75.6 75.6 76.0
MMLU 63.16 63.47 62.75
French Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 56.4 56.17 54.77
ARC_fr 51.9 51.4 50.0
Hellaswag_fr 65.8 65.8 63.8
MMLU_fr 51.5 51.3 50.5
German Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 51.83 51.73 51.7
ARC_de 47.6 47.5 47.3
Hellaswag_de 58.9 59.0 57.3
MMLU_de 49.0 48.7 50.5
Italian Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 54.93 54.8 52.83
ARC_it 51.6 51.6 49.3
Hellaswag_it 63.5 63.8 61.0
MMLU_it 49.7 49.0 48.2
Safety Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 60.32 60.54 64.8
RealToxicityPrompts 89.7 90.0 90.7
TruthfulQA 59.71 59.48 58.32
CrowS 31.54 32.14 45.38
Spanish Mistral-7B-Instruct-v0.3 Mistral-7B-Instruct-v0.3-GPTQ-8b Mistral-7B-Instruct-v0.3-GPTQ-4b
Avg. 57.9 57.97 56.1
ARC_es 53.5 53.5 51
Hellaswag_es 68.5 68.5 66.2
MMLU_es 51.7 51.9 51.1

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L4x1 3.75 1867.13
NVIDIA L4x2 5.03 2503.83
NVIDIA L4x4 5.86 2916.3
Performance measured on cortecs inference.
Downloads last month
9
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b