Upload README.md with huggingface_hub

8d4b0a1 verified 7 months ago

13.1 kB

	---
	datasets: wikitext
	---
	This is a quantized model of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\|:--------------\|:--------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|
	\| Avg. \| 67.65 \| 67.72 \| 66.95 \|
	\| ARC \| 64.2 \| 64.1 \| 62.1 \|
	\| Hellaswag \| 75.6 \| 75.6 \| 76.0 \|
	\| MMLU \| 63.16 \| 63.47 \| 62.75 \|
	\| \| \| \| \|
	\| __French__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 56.4 \| 56.17 \| 54.77 \|
	\| ARC_fr \| 51.9 \| 51.4 \| 50.0 \|
	\| Hellaswag_fr \| 65.8 \| 65.8 \| 63.8 \|
	\| MMLU_fr \| 51.5 \| 51.3 \| 50.5 \|
	\| \| \| \| \|
	\| __German__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 51.83 \| 51.73 \| 51.7 \|
	\| ARC_de \| 47.6 \| 47.5 \| 47.3 \|
	\| Hellaswag_de \| 58.9 \| 59.0 \| 57.3 \|
	\| MMLU_de \| 49.0 \| 48.7 \| 50.5 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 54.93 \| 54.8 \| 52.83 \|
	\| ARC_it \| 51.6 \| 51.6 \| 49.3 \|
	\| Hellaswag_it \| 63.5 \| 63.8 \| 61.0 \|
	\| MMLU_it \| 49.7 \| 49.0 \| 48.2 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 60.32 \| 60.54 \| 64.8 \|
	\| RealToxicityPrompts \| 89.7 \| 90.0 \| 90.7 \|
	\| TruthfulQA \| 59.71 \| 59.48 \| 58.32 \|
	\| CrowS \| 31.54 \| 32.14 \| 45.38 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 57.9 \| 57.97 \| 56.1 \|
	\| ARC_es \| 53.5 \| 53.5 \| 51 \|
	\| Hellaswag_es \| 68.5 \| 68.5 \| 66.2 \|
	\| MMLU_es \| 51.7 \| 51.9 \| 51.1 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:------------\|-------------:\|-----------:\|
	\| NVIDIA L4x1 \| 3.75 \| 1867.13 \|
	\| NVIDIA L4x2 \| 5.03 \| 2503.83 \|
	\| NVIDIA L4x4 \| 5.86 \| 2916.3 \|
	Performance measured on [cortecs inference](https://cortecs.ai).

	---
	datasets: wikitext
	---
	This is a quantized model of [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\|:--------------\|:--------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|
	\| Avg. \| 67.65 \| 67.72 \| 66.95 \|
	\| ARC \| 64.2 \| 64.1 \| 62.1 \|
	\| Hellaswag \| 75.6 \| 75.6 \| 76.0 \|
	\| MMLU \| 63.16 \| 63.47 \| 62.75 \|
	\| \| \| \| \|
	\| __French__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 56.4 \| 56.17 \| 54.77 \|
	\| ARC_fr \| 51.9 \| 51.4 \| 50.0 \|
	\| Hellaswag_fr \| 65.8 \| 65.8 \| 63.8 \|
	\| MMLU_fr \| 51.5 \| 51.3 \| 50.5 \|
	\| \| \| \| \|
	\| __German__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 51.83 \| 51.73 \| 51.7 \|
	\| ARC_de \| 47.6 \| 47.5 \| 47.3 \|
	\| Hellaswag_de \| 58.9 \| 59.0 \| 57.3 \|
	\| MMLU_de \| 49.0 \| 48.7 \| 50.5 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 54.93 \| 54.8 \| 52.83 \|
	\| ARC_it \| 51.6 \| 51.6 \| 49.3 \|
	\| Hellaswag_it \| 63.5 \| 63.8 \| 61.0 \|
	\| MMLU_it \| 49.7 \| 49.0 \| 48.2 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 60.32 \| 60.54 \| 64.8 \|
	\| RealToxicityPrompts \| 89.7 \| 90.0 \| 90.7 \|
	\| TruthfulQA \| 59.71 \| 59.48 \| 58.32 \|
	\| CrowS \| 31.54 \| 32.14 \| 45.38 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-8b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-8b)__ \| __[Mistral-7B-Instruct-v0.3-GPTQ-4b](https://huggingface.co/cortecs/Mistral-7B-Instruct-v0.3-GPTQ-4b)__ \|
	\| Avg. \| 57.9 \| 57.97 \| 56.1 \|
	\| ARC_es \| 53.5 \| 53.5 \| 51 \|
	\| Hellaswag_es \| 68.5 \| 68.5 \| 66.2 \|
	\| MMLU_es \| 51.7 \| 51.9 \| 51.1 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:------------\|-------------:\|-----------:\|
	\| NVIDIA L4x1 \| 3.75 \| 1867.13 \|
	\| NVIDIA L4x2 \| 5.03 \| 2503.83 \|
	\| NVIDIA L4x4 \| 5.86 \| 2916.3 \|
	Performance measured on [cortecs inference](https://cortecs.ai).