Joseph717171
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,126 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
author: froggeric []()
|
5 |
+
---
|
6 |
+
|
7 |
+
# Input files for generating the Importance Matrix
|
8 |
+
|
9 |
+
|
10 |
+
## Which file to use for generating the importance matrix
|
11 |
+
|
12 |
+
Not all importance matrices are equal. The best results are obtained when using a source file similar to the
|
13 |
+
training data. Size also matters: the bigger the model (eg: 70b vs 13b) and the higher the quant (eg: q6k_ vs iq3_xs),
|
14 |
+
the bigger the source file needs to be to make an impact. Multiple input files can be combined if needed;
|
15 |
+
for example:
|
16 |
+
```
|
17 |
+
cat technical.txt multilingual.txt wiki.txt >custom.matrix
|
18 |
+
```
|
19 |
+
Note on **context size** when generating the matrix: in general, a small context size such as 512 is recommended, and community
|
20 |
+
tests have shown it usually performs than a larger one such as 4096. However, I would argue this is is highly dependent on the
|
21 |
+
source data you are using: with random tokens or short text a small context makes sense; but when using larger texts, a larger
|
22 |
+
context matching the size of the texts might be a better choice. Remember that the size is in tokens, which roughly translates
|
23 |
+
to number of words, not characters.
|
24 |
+
|
25 |
+
You will find below descriptions for the various input files provided, to help you choose the correct one.
|
26 |
+
|
27 |
+
## Community provided files
|
28 |
+
|
29 |
+
**groups_merged**\
|
30 |
+
_"Here is a decent general purpose imatrix calibration dataset. It should be more diverse than wikitext at ~30k tokens, as it is excerpts of a larger dataset which includes coding examples (which seems quite important!)
|
31 |
+
This means it's generally higher entropy data compared to wikitext, and it's real data rather than pseudo-randomly generated data.
|
32 |
+
I get lower KL div than wikitext for the same length and the outputs seem qualitatively better."_ (kalomaze)\
|
33 |
+
https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384
|
34 |
+
|
35 |
+
**group_10_merged**\
|
36 |
+
(superseeded by groups_merged)\
|
37 |
+
_"This is about ~50k pseudo-random tokens.
|
38 |
+
I am getting the best balance between the maximum divergence and the other divergence statistics using this file when quantizing 7b"_ (kalomaze)\
|
39 |
+
https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8349233
|
40 |
+
|
41 |
+
**20k_random_data**\
|
42 |
+
(superseeded by groups_10_merged)\
|
43 |
+
https://github.com/ggerganov/llama.cpp/discussions/5006#discussioncomment-8163190
|
44 |
+
|
45 |
+
**8k_random_data**\
|
46 |
+
(superseeded by 20k_random_data)\
|
47 |
+
https://github.com/ggerganov/llama.cpp/discussions/5006#discussion-6087829
|
48 |
+
|
49 |
+
**badwords**\
|
50 |
+
402 english words that can be considered dirty, naughty, obscene, or otherwise bad words.
|
51 |
+
This could be useful to remove guard rails.
|
52 |
+
Compiled from [Shutterstock github repo](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master)
|
53 |
+
|
54 |
+
**badwords_multilingual**\
|
55 |
+
2580 words that can be considered dirty, naughty, obscene, or otherwise bad words. Includes 26 languages.
|
56 |
+
This could be useful to remove guard rails.
|
57 |
+
Compiled from [Shutterstock github repo](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master)
|
58 |
+
|
59 |
+
**ptb.train**\
|
60 |
+
Penn Treebank (PTB) is a widely used preprocessed large dataset designed for language training. Casing,
|
61 |
+
punctuation and numbers have been removed from the training data. Recently it has kind of been superseeded
|
62 |
+
by WikiText which does not have these removals, features a larger vocabulary and full articles (better
|
63 |
+
suited for models that can take advantage of long term dependencies). However, for importantce matrix training,
|
64 |
+
PTB is still a valid dataset, which has the advantage of being manually curated, and similar to WikiText,
|
65 |
+
without being WikiText; this can help against bias.
|
66 |
+
|
67 |
+
**WikiText**\
|
68 |
+
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of
|
69 |
+
verified Good and Featured articles on Wikipedia. Compared to PTB, WikiText-2 is over 2 times larger and
|
70 |
+
WikiText-103 is over 110 times larger. As it is composed of full articles, the dataset is well suited for models
|
71 |
+
that can take advantage of long term dependencies.\
|
72 |
+
https://huggingface.co/datasets/wikitext
|
73 |
+
|
74 |
+
**WikiText_FR**\
|
75 |
+
70 million tokens extracted from the set of french Wikipedia articles that are classified as "quality articles"
|
76 |
+
or "good articles".\
|
77 |
+
https://huggingface.co/datasets/asi/wikitext_fr
|
78 |
+
|
79 |
+
**c4**\
|
80 |
+
The C4 dataset is a collection text sourced from the public Common Crawl web scrape.
|
81 |
+
It includes heuristics to extract only natural language (as opposed to boilerplate and other gibberish)
|
82 |
+
in addition to extensive deduplication. C4 dataset was explicitly designed to be English only:
|
83 |
+
any page that was not given a probability of at least 99% of being English by langdetect was discarded.
|
84 |
+
|
85 |
+
**code** (exllamav2)\
|
86 |
+
Programming
|
87 |
+
|
88 |
+
**multilingual** (exllamav2)\
|
89 |
+
English, Arabic, Chinese, French, German, Japanese, Polish, Russian, Spanish, Swedish, Turkish, Hebrew,
|
90 |
+
Macedonian, Norwegian, Lithuanian, Greek, Italian, Afrikaans, Dutch, Danish.
|
91 |
+
|
92 |
+
**technical** (exllamav2)\
|
93 |
+
Technical writing.
|
94 |
+
|
95 |
+
**tiny**\
|
96 |
+
Very short stories. Be mindful of the prevalence of _"Once upon a time"_ and _"<|endoftext|>"_.
|
97 |
+
Extract from [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories)
|
98 |
+
|
99 |
+
**wiki** (exllamav2)\
|
100 |
+
Small Wikipedia dump. Unclean, contains many unwanted tags.
|
101 |
+
|
102 |
+
exllamav2 calibration data taken from:\
|
103 |
+
https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data
|
104 |
+
|
105 |
+
## How to quantize using an imatrix, with llama.cpp
|
106 |
+
|
107 |
+
1. Get one of the input files collected here, or elsewhere.
|
108 |
+
2. Convert or download the model you want to quantise, in fp16 GGUF format.
|
109 |
+
3. Generate an imatrix file specific to the model you want to quantise
|
110 |
+
```
|
111 |
+
cd <llama.cpp directory>
|
112 |
+
./imatrix -m <model_path>/ggml-model-f16.gguf -f <plain_text_matrix_file> -o <output.matrix> -t 12 -ngl 144 --chunks 100 -b 512 -c 512
|
113 |
+
|
114 |
+
# -ngl : layers offloaded to gpu (recommended to use number of layers the model contains)
|
115 |
+
# -t 12 : number of threads (should probably match no of cpu)
|
116 |
+
# -c 512 : context size, testing seems to show 512 is recommended (default=512, 0=loaded from model)
|
117 |
+
# -b 200 : batch size (default=512)
|
118 |
+
# --chunks 100 (recommended)
|
119 |
+
# --mlock : keep model in ram (only use if you had sufficient RAM for the whole fp16)
|
120 |
+
```
|
121 |
+
4. Use the generated matrix file to quantise the model
|
122 |
+
```
|
123 |
+
./quantize --matrix <output.matrix> <model_path>/ggml-model-f16.gguf <quantisation_level, eg:IQ4_XS>
|
124 |
+
```
|
125 |
+
Note: normal quantisation also benefits from using a matrix file. It also seem that a bigger input matrix is
|
126 |
+
better for higher quantisation.
|