Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
inference: false
|
3 |
+
license: llama2
|
4 |
+
model_creator: WizardLM
|
5 |
+
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
|
6 |
+
model_name: WizardLM 70B V1.0
|
7 |
+
model_type: llama
|
8 |
+
quantized_by: Thireus
|
9 |
+
---
|
10 |
+
|
11 |
+
# WizardLM 70B V1.0 β EXL2
|
12 |
+
- Model creator: [WizardLM](https://huggingface.co/WizardLM)
|
13 |
+
- Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
|
14 |
+
- Model used for quantization: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) β float16 of [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
|
15 |
+
|
16 |
+
## Models available in this repository
|
17 |
+
|
18 |
+
| Link | BITS (-b) | HEAD BITS (-hb) | MEASUREMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | ExLlama | Max Context Length |
|
19 |
+
| ------ | --------- | --------------- | ------------------------ | ----------- | ---------------- | ---- | ------- | ------------------ |
|
20 |
+
| [here](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/) | 4.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | 35GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
|
21 |
+
| [here](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-5.0bpw-h6-exl2/) | 5.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
|
22 |
+
| _coming soon..._ | 6.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
|
23 |
+
|
24 |
+
\* wikitext-2-raw-v1
|
25 |
+
|
26 |
+
## Description:
|
27 |
+
|
28 |
+
_This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)._
|
29 |
+
|
30 |
+
EXL2 is a new format used by ExLlamaV2 β https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
|
31 |
+
levels within a model to achieve any average bitrate between 2 and 8 bits per weight.
|
32 |
+
|
33 |
+
## Prompt template (official):
|
34 |
+
|
35 |
+
```
|
36 |
+
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
|
37 |
+
```
|
38 |
+
|
39 |
+
## Prompt template (suggested):
|
40 |
+
|
41 |
+
```
|
42 |
+
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
|
43 |
+
USER:
|
44 |
+
{prompt}
|
45 |
+
ASSISTANT:
|
46 |
+
|
47 |
+
|
48 |
+
```
|
49 |
+
|
50 |
+
## Quantization process:
|
51 |
+
|
52 |
+
| Original Model | β | (optional but recommended) Float16 Model* | β | Safetensor Model** | β | EXL2 Model |
|
53 |
+
| -------------- | --- | ------------- | --- | ---------------- | --- | ---------- |
|
54 |
+
| [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | β | [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF)* | β | Safetensor** | β | EXL2 |
|
55 |
+
|
56 |
+
Example to convert WizardLM-70B-V1.0-HF to EXL2 4.0 bpw with 6-bit head:
|
57 |
+
|
58 |
+
```
|
59 |
+
mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
|
60 |
+
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
|
61 |
+
```
|
62 |
+
|
63 |
+
\* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:
|
64 |
+
|
65 |
+
- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py
|
66 |
+
(best for sharding and float16/FP16 or bfloat16/BF16 conversion)
|
67 |
+
|
68 |
+
Example to convert [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) directly to float16 safetensors in 10GB shards:
|
69 |
+
|
70 |
+
```
|
71 |
+
python convert-to-safetensors.py ~/original/WizardLM-70B-V1.0 --output ~/float16_safetensored/WizardLM-70B-V1.0 --max-shard-size 10GB
|
72 |
+
```
|
73 |
+
|
74 |
+
Use `--bf16` if you'd like to try bfloat16 instead, but note that there are concerns about quantization quality β https://github.com/turboderp/exllamav2/issues/30#issuecomment-1719009289
|
75 |
+
|
76 |
+
\*\* Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:
|
77 |
+
|
78 |
+
- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py (official ExLlamaV2)
|
79 |
+
- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py (recommended)
|
80 |
+
- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e#file-2safetensors-py
|
81 |
+
|
82 |
+
## Further reading:
|
83 |
+
|
84 |
+
- https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html
|