nisten commited on
Commit
13c5efc
1 Parent(s): 2f24f9e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+ # Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations
7
+
8
+ This repository contains iMatrix quantizations of the [dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context.
9
+
10
+ The iMatrix file was generated using the `wiki.train.raw` dataset, which took a few hours to process. We have also included the `wiki.test.raw` file for perplexity testing.
11
+
12
+ ## Quantization Benefits
13
+
14
+ You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model.
15
+
16
+ ## TODO: Benchmarks
17
+
18
+ 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
19
+
20
+ ## Notes
21
+
22
+ - The 8-bit weight is **not** iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with `dolphinf16`.
23
+ - All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations.
24
+ - iMatrix quantization can be applied to all k quantizations, not just the i ones.
25
+ - 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
26
+
27
+ ## TODO
28
+
29
+ - Upload perplexity benchmarks of each quantization vs f16.