Joseph717171 commited on
Commit
226a0fa
·
verified ·
1 Parent(s): 3bc0227

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ author: froggeric []()
5
+ ---
6
+
7
+ # Input files for generating the Importance Matrix
8
+
9
+
10
+ ## Which file to use for generating the importance matrix
11
+
12
+ Not all importance matrices are equal. The best results are obtained when using a source file similar to the
13
+ training data. Size also matters: the bigger the model (eg: 70b vs 13b) and the higher the quant (eg: q6k_ vs iq3_xs),
14
+ the bigger the source file needs to be to make an impact. Multiple input files can be combined if needed;
15
+ for example:
16
+ ```
17
+ cat technical.txt multilingual.txt wiki.txt >custom.matrix
18
+ ```
19
+ Note on **context size** when generating the matrix: in general, a small context size such as 512 is recommended, and community
20
+ tests have shown it usually performs than a larger one such as 4096. However, I would argue this is is highly dependent on the
21
+ source data you are using: with random tokens or short text a small context makes sense; but when using larger texts, a larger
22
+ context matching the size of the texts might be a better choice. Remember that the size is in tokens, which roughly translates
23
+ to number of words, not characters.
24
+
25
+ You will find below descriptions for the various input files provided, to help you choose the correct one.
26
+
27
+ ## Community provided files
28
+
29
+ **groups_merged**\
30
+ _"Here is a decent general purpose imatrix calibration dataset. It should be more diverse than wikitext at ~30k tokens, as it is excerpts of a larger dataset which includes coding examples (which seems quite important!)
31
+ This means it's generally higher entropy data compared to wikitext, and it's real data rather than pseudo-randomly generated data.
32
+ I get lower KL div than wikitext for the same length and the outputs seem qualitatively better."_ (kalomaze)\
33
+ https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384
34
+
35
+ **group_10_merged**\
36
+ (superseeded by groups_merged)\
37
+ _"This is about ~50k pseudo-random tokens.
38
+ I am getting the best balance between the maximum divergence and the other divergence statistics using this file when quantizing 7b"_ (kalomaze)\
39
+ https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8349233
40
+
41
+ **20k_random_data**\
42
+ (superseeded by groups_10_merged)\
43
+ https://github.com/ggerganov/llama.cpp/discussions/5006#discussioncomment-8163190
44
+
45
+ **8k_random_data**\
46
+ (superseeded by 20k_random_data)\
47
+ https://github.com/ggerganov/llama.cpp/discussions/5006#discussion-6087829
48
+
49
+ **badwords**\
50
+ 402 english words that can be considered dirty, naughty, obscene, or otherwise bad words.
51
+ This could be useful to remove guard rails.
52
+ Compiled from [Shutterstock github repo](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master)
53
+
54
+ **badwords_multilingual**\
55
+ 2580 words that can be considered dirty, naughty, obscene, or otherwise bad words. Includes 26 languages.
56
+ This could be useful to remove guard rails.
57
+ Compiled from [Shutterstock github repo](https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words/tree/master)
58
+
59
+ **ptb.train**\
60
+ Penn Treebank (PTB) is a widely used preprocessed large dataset designed for language training. Casing,
61
+ punctuation and numbers have been removed from the training data. Recently it has kind of been superseeded
62
+ by WikiText which does not have these removals, features a larger vocabulary and full articles (better
63
+ suited for models that can take advantage of long term dependencies). However, for importantce matrix training,
64
+ PTB is still a valid dataset, which has the advantage of being manually curated, and similar to WikiText,
65
+ without being WikiText; this can help against bias.
66
+
67
+ **WikiText**\
68
+ The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of
69
+ verified Good and Featured articles on Wikipedia. Compared to PTB, WikiText-2 is over 2 times larger and
70
+ WikiText-103 is over 110 times larger. As it is composed of full articles, the dataset is well suited for models
71
+ that can take advantage of long term dependencies.\
72
+ https://huggingface.co/datasets/wikitext
73
+
74
+ **WikiText_FR**\
75
+ 70 million tokens extracted from the set of french Wikipedia articles that are classified as "quality articles"
76
+ or "good articles".\
77
+ https://huggingface.co/datasets/asi/wikitext_fr
78
+
79
+ **c4**\
80
+ The C4 dataset is a collection text sourced from the public Common Crawl web scrape.
81
+ It includes heuristics to extract only natural language (as opposed to boilerplate and other gibberish)
82
+ in addition to extensive deduplication. C4 dataset was explicitly designed to be English only:
83
+ any page that was not given a probability of at least 99% of being English by langdetect was discarded.
84
+
85
+ **code** (exllamav2)\
86
+ Programming
87
+
88
+ **multilingual** (exllamav2)\
89
+ English, Arabic, Chinese, French, German, Japanese, Polish, Russian, Spanish, Swedish, Turkish, Hebrew,
90
+ Macedonian, Norwegian, Lithuanian, Greek, Italian, Afrikaans, Dutch, Danish.
91
+
92
+ **technical** (exllamav2)\
93
+ Technical writing.
94
+
95
+ **tiny**\
96
+ Very short stories. Be mindful of the prevalence of _"Once upon a time"_ and _"<|endoftext|>"_.
97
+ Extract from [TinyStories dataset](https://huggingface.co/datasets/roneneldan/TinyStories)
98
+
99
+ **wiki** (exllamav2)\
100
+ Small Wikipedia dump. Unclean, contains many unwanted tags.
101
+
102
+ exllamav2 calibration data taken from:\
103
+ https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data
104
+
105
+ ## How to quantize using an imatrix, with llama.cpp
106
+
107
+ 1. Get one of the input files collected here, or elsewhere.
108
+ 2. Convert or download the model you want to quantise, in fp16 GGUF format.
109
+ 3. Generate an imatrix file specific to the model you want to quantise
110
+ ```
111
+ cd <llama.cpp directory>
112
+ ./imatrix -m <model_path>/ggml-model-f16.gguf -f <plain_text_matrix_file> -o <output.matrix> -t 12 -ngl 144 --chunks 100 -b 512 -c 512
113
+
114
+ # -ngl : layers offloaded to gpu (recommended to use number of layers the model contains)
115
+ # -t 12 : number of threads (should probably match no of cpu)
116
+ # -c 512 : context size, testing seems to show 512 is recommended (default=512, 0=loaded from model)
117
+ # -b 200 : batch size (default=512)
118
+ # --chunks 100 (recommended)
119
+ # --mlock : keep model in ram (only use if you had sufficient RAM for the whole fp16)
120
+ ```
121
+ 4. Use the generated matrix file to quantise the model
122
+ ```
123
+ ./quantize --matrix <output.matrix> <model_path>/ggml-model-f16.gguf <quantisation_level, eg:IQ4_XS>
124
+ ```
125
+ Note: normal quantisation also benefits from using a matrix file. It also seem that a bigger input matrix is
126
+ better for higher quantisation.