cortecs
/

Meta-Llama-3-70B-Instruct-GPTQ

+This is a quantized model of [Llama-3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
+ using the following configuration:
+ - 4bit (8bit will follow)
+- Act order: True
+ - Group size: 128
+ - Seq. length: 4096
+ - Dataset: [Wikitext2](https://huggingface.co/datasets/wikitext)
+## Usage
+Install **vLLM** and
+    run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):
+```
+python -m vllm.entrypoints.openai.api_server --model cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ
+```
+Access the model:
+```
+curl http://localhost:8000/v1/completions
+    -H "Content-Type: application/json"
+    -d '{
+        "model": "cortecs/cortecs--Meta-Llama-3-70B-Instruct-GPTQ",
+        "prompt": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>
+Tell me a joke<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
+    }'
+```
+## Evaluations
+| __English__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
+|:--------------|:---------------------------|:-----------------------|:--------------------------|
+| Avg.          | 76.19                      | 75.14                  | 66.97                     |
+| ARC           | 71.6                       | 70.7                   | 62.5                      |
+| Hellaswag     | 77.3                       | 76.4                   | 70.3                      |
+| MMLU          | 79.66                      | 78.33                  | 68.11                     |
+|               |                            |                        |                           |
+| __French__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
+| Avg.         | 70.97                      | 70.27                  | 57.73                     |
+| ARC_fr       | 65.0                       | 64.7                   | 53.3                      |
+| Hellaswag_fr | 72.4                       | 71.4                   | 61.7                      |
+| MMLU_fr      | 75.5                       | 74.7                   | 58.2                      |
+|              |                            |                        |                           |
+| __German__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
+| Avg.         | 68.43                      | 66.93                  | 53.47                     |
+| ARC_de       | 64.2                       | 62.6                   | 49.1                      |
+| Hellaswag_de | 67.8                       | 66.7                   | 55.0                      |
+| MMLU_de      | 73.3                       | 71.5                   | 56.3                      |
+|              |                            |                        |                           |
+| __Italian__   | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
+| Avg.          | 70.17                      | 68.63                  | 56.73                     |
+| ARC_it        | 64.0                       | 62.1                   | 51.6                      |
+| Hellaswag_it  | 72.6                       | 71.0                   | 61.3                      |
+| MMLU_it       | 73.9                       | 72.8                   | 57.3                      |
+|               |                            |                        |                           |
+| __Safety__          | __Llama-3 70B Instruct__   | __Llama 3 70B GPTQ__   | __Llama-3 8B Instruct__   |
+| Avg.                | 64.28                      | 63.64                  | 61.42                     |
+| RealToxicityPrompts | 97.9                       | 98.1                   | 97.2                      |
+| TruthfulQA          | 61.91                      | 59.91                  | 51.65                     |
+| CrowS               | 33.04                      | 32.92                  | 35.42                     |
+|                     |                            |                        |                           |
+| __Spanish__   |   __Llama-3 70B Instruct__ |   __Llama 3 70B GPTQ__ |   __Llama-3 8B Instruct__ |
+| Avg.          |                       72.5 |                   71.3 |                      59   |
+| ARC_es        |                       66.7 |                   65.7 |                      54.1 |
+| Hellaswag_es  |                       75.8 |                   74   |                      63.8 |
+| MMLU_es       |                       75   |                   74.2 |                      59.1 |
+Take with caution. We did not check for data contamination.
+     Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000` for big datasets.
+## Performance
+| __Llama-3 70B Instruct__   | __requests/s__   | __tokens/s__   |
+|:---------------------------|:-----------------|:---------------|
+| NVIDIA L40Sx4              | 2.38             | 1135.41        |
+|                            |                  |                |
+| __Llama 3 70B GPTQ__   | __requests/s__   | __tokens/s__   |
+| NVIDIA L40Sx2          | 1.58             | 750.89         |
+|                        |                  |                |
+| __Llama-3 8B Instruct__   |   __requests/s__ |   __tokens/s__ |
+| NVIDIA L40Sx1             |            11.64 |        5548.63 |
+| NVIDIA L4x1               |             2.76 |        1315.25 |
+| NVIDIA L4x2               |             4.79 |        2283.53 |
+Performance was measured on [cortecs.ai](https://cortecs.ai).