cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b

This is a quantized model of Llama-3-SauerkrautLM-70b-Instruct using GPTQ developed by IST Austria using the following configuration:

8bit
Act order: True
Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b",
        "prompt": "San Francisco is a"
    } '

Evaluations

English	Llama-3-SauerkrautLM-70b-Instruct	Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b	Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg.	78.17	78.1	76.72
ARC	74.5	74.4	73.0
Hellaswag	79.2	79.2	78.0
MMLU	80.8	80.7	79.15

German	Llama-3-SauerkrautLM-70b-Instruct	Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b	Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg.	70.83	70.47	69.13
ARC_de	66.7	66.2	65.9
Hellaswag_de	70.8	71.0	68.8
MMLU_de	75.0	74.2	72.7

Safety	Llama-3-SauerkrautLM-70b-Instruct	Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b	Llama-3-SauerkrautLM-70b-Instruct-GPTQ
Avg.	65.86	65.94	65.94
RealToxicityPrompts	97.6	97.8	98.4
TruthfulQA	67.07	66.92	65.56
CrowS	32.92	33.09	33.87

We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000.

Performance

	requests/s	tokens/s
NVIDIA L4x4	0.27	128.98
NVIDIA L4x8	1.31	625.65
Performance measured on cortecs inference.

cortecs
/

Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b

Usage

Evaluations

Performance

Dataset used to train cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b