Thireus commited on
Commit
49b2a5d
β€’
1 Parent(s): f15becb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ license: llama2
4
+ model_creator: WizardLM
5
+ model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
6
+ model_name: WizardLM 70B V1.0
7
+ model_type: llama
8
+ quantized_by: Thireus
9
+ ---
10
+
11
+ # WizardLM 70B V1.0 – EXL2
12
+ - Model creator: [WizardLM](https://huggingface.co/WizardLM)
13
+ - Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
14
+ - Model used for quantization: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) – float16 of [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
15
+
16
+ ## Models available in this repository
17
+
18
+ | Link | BITS (-b) | HEAD BITS (-hb) | MEASUREMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | ExLlama | Max Context Length |
19
+ | ------ | --------- | --------------- | ------------------------ | ----------- | ---------------- | ---- | ------- | ------------------ |
20
+ | [here](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/) | 4.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | 35GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
21
+ | [here](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-5.0bpw-h6-exl2/) | 5.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
22
+ | _coming soon..._ | 6.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train)* | ...GB | [v2](https://github.com/turboderp/exllamav2) | 4096 |
23
+
24
+ \* wikitext-2-raw-v1
25
+
26
+ ## Description:
27
+
28
+ _This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)._
29
+
30
+ EXL2 is a new format used by ExLlamaV2 – https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
31
+ levels within a model to achieve any average bitrate between 2 and 8 bits per weight.
32
+
33
+ ## Prompt template (official):
34
+
35
+ ```
36
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
37
+ ```
38
+
39
+ ## Prompt template (suggested):
40
+
41
+ ```
42
+ A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
43
+ USER:
44
+ {prompt}
45
+ ASSISTANT:
46
+
47
+
48
+ ```
49
+
50
+ ## Quantization process:
51
+
52
+ | Original Model | β†’ | (optional but recommended) Float16 Model* | β†’ | Safetensor Model** | β†’ | EXL2 Model |
53
+ | -------------- | --- | ------------- | --- | ---------------- | --- | ---------- |
54
+ | [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | β†’ | [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF)* | β†’ | Safetensor** | β†’ | EXL2 |
55
+
56
+ Example to convert WizardLM-70B-V1.0-HF to EXL2 4.0 bpw with 6-bit head:
57
+
58
+ ```
59
+ mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
60
+ python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
61
+ ```
62
+
63
+ \* Use the following script to convert your local pytorch_model bin files to float16 (you can also choose bfloat16) + safetensors all in one go:
64
+
65
+ - https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py
66
+ (best for sharding and float16/FP16 or bfloat16/BF16 conversion)
67
+
68
+ Example to convert [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) directly to float16 safetensors in 10GB shards:
69
+
70
+ ```
71
+ python convert-to-safetensors.py ~/original/WizardLM-70B-V1.0 --output ~/float16_safetensored/WizardLM-70B-V1.0 --max-shard-size 10GB
72
+ ```
73
+
74
+ Use `--bf16` if you'd like to try bfloat16 instead, but note that there are concerns about quantization quality – https://github.com/turboderp/exllamav2/issues/30#issuecomment-1719009289
75
+
76
+ \*\* Use any one of the following scripts to convert your local pytorch_model bin files to safetensors:
77
+
78
+ - https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py (official ExLlamaV2)
79
+ - https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py (recommended)
80
+ - https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e#file-2safetensors-py
81
+
82
+ ## Further reading:
83
+
84
+ - https://mlabonne.github.io/blog/posts/Introduction_to_Weight_Quantization.html