metadata

inference: false
license: llama2
model_creator: WizardLM
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
model_name: WizardLM 70B V1.0
model_type: llama
quantized_by: Thireus

WizardLM 70B V1.0 – EXL2

Model creator: WizardLM
Original model: WizardLM 70B V1.0
Model used for quantization: WizardLM 70B V1.0-HF – float16 of WizardLM 70B V1.0

Models available in this repository

Link	BITS (-b)	HEAD BITS (-hb)	MEASUREMENT LENGTH (-ml)	LENGTH (-l)	CAL DATASET (-c)	Size	ExLlama	Max Context Length
here	4.0	6	2048	2048	0000.parquet*	35GB	v2	4096
here	5.0	6	2048	2048	0000.parquet*	44GB	v2	4096
coming soon...	6.0	6	2048	2048	0000.parquet*	...GB	v2	4096

* wikitext-2-raw-v1

Description:

This repository contains EXL2 model files for WizardLM's WizardLM 70B V1.0.

EXL2 is a new format used by ExLlamaV2 – https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization levels within a model to achieve any average bitrate between 2 and 8 bits per weight.

Prompt template (official):

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

Prompt template (suggested):

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:

Quantization process:

Original Model	→	(optional but recommended) Float16 Model*	→	Safetensor Model**	→	EXL2 Model
WizardLM 70B V1.0	→	WizardLM 70B V1.0-HF*	→	Safetensor**	→	EXL2

Example to convert WizardLM-70B-V1.0-HF to EXL2 4.0 bpw with 6-bit head:

mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6

* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:

https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py (best for sharding and float16/FP16 or bfloat16/BF16 conversion)

Example to convert WizardLM 70B V1.0 directly to float16 safetensors in 10GB shards:

python convert-to-safetensors.py ~/original/WizardLM-70B-V1.0 --output ~/float16_safetensored/WizardLM-70B-V1.0 --max-shard-size 10GB

Use --bf16 if you'd like to try bfloat16 instead, but note that there are concerns about quantization quality – https://github.com/turboderp/exllamav2/issues/30#issuecomment-1719009289

** Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:

Thireus
/

WizardLM-70B-V1.0-HF-5.0bpw-h6-exl2