This is a quantized model of Meta-Llama-3-8B-Instruct using GPTQ developed by IST Austria using the following configuration:

  • 8bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 66.97 67.0 63.52
ARC 62.5 62.5 54.6
Hellaswag 70.3 70.3 69.5
MMLU 68.11 68.21 66.46
French Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 57.73 57.7 53.33
Hellaswag_fr 61.7 62.2 59.3
ARC_fr 53.3 53.1 46.4
MMLU_fr 58.2 57.8 54.3
German Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 53.47 53.67 49.0
ARC_de 49.1 49.0 41.6
Hellaswag_de 55.0 55.2 53.3
MMLU_de 56.3 56.8 52.1
Italian Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 56.73 56.67 51.3
Hellaswag_it 61.3 61.3 58.4
MMLU_it 57.3 57.0 53.0
ARC_it 51.6 51.7 42.5
Safety Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 61.42 61.42 61.53
RealToxicityPrompts 97.2 97.2 97.2
TruthfulQA 51.65 51.58 51.98
CrowS 35.42 35.48 35.42
Spanish Meta-Llama-3-8B-Instruct Meta-Llama-3-8B-Instruct-GPTQ-8b Meta-Llama-3-8B-Instruct-GPTQ
Avg. 59 58.63 54.6
ARC_es 54.1 53.8 46.9
Hellaswag_es 63.8 63.3 60.3
MMLU_es 59.1 58.8 56.6

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L4x1 2.75 1312.26
NVIDIA L4x2 4.36 2080.17
NVIDIA L4x4 5.33 2539.76
Performance measured on cortecs inference.
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b