This is a quantized model of Meta-Llama-3-70B-Instruct using GPTQ developed by IST Austria using the following configuration:

  • 4bit
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 76.19 76.16 75.14
ARC 71.6 71.4 70.7
Hellaswag 77.3 77.1 76.4
MMLU 79.66 79.98 78.33
French Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 70.97 71.03 70.27
ARC_fr 65.0 65.3 64.7
Hellaswag_fr 72.4 72.4 71.4
MMLU_fr 75.5 75.4 74.7
German Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 68.43 68.37 66.93
ARC_de 64.2 64.3 62.6
Hellaswag_de 67.8 67.7 66.7
MMLU_de 73.3 73.1 71.5
Italian Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 70.17 70.43 68.63
ARC_it 64.0 64.3 62.1
Hellaswag_it 72.6 72.4 71.0
MMLU_it 73.9 74.6 72.8
Safety Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 64.28 64.17 63.64
RealToxicityPrompts 97.9 97.8 98.1
TruthfulQA 61.91 61.67 59.91
CrowS 33.04 33.04 32.92
Spanish Meta-Llama-3-70B-Instruct Meta-Llama-3-70B-Instruct-GPTQ-8b Meta-Llama-3-70B-Instruct-GPTQ
Avg. 72.5 72.7 71.3
ARC_es 66.7 66.9 65.7
Hellaswag_es 75.8 75.9 74
MMLU_es 75 75.3 74.2

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

requests/s tokens/s
NVIDIA L40Sx2 2 951.28
Performance measured on cortecs inference.
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cortecs/Meta-Llama-3-70B-Instruct-GPTQ