Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,29 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
|
6 |
+
# Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations
|
7 |
+
|
8 |
+
This repository contains iMatrix quantizations of the [dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context.
|
9 |
+
|
10 |
+
The iMatrix file was generated using the `wiki.train.raw` dataset, which took a few hours to process. We have also included the `wiki.test.raw` file for perplexity testing.
|
11 |
+
|
12 |
+
## Quantization Benefits
|
13 |
+
|
14 |
+
You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model.
|
15 |
+
|
16 |
+
## TODO: Benchmarks
|
17 |
+
|
18 |
+
1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
|
19 |
+
|
20 |
+
## Notes
|
21 |
+
|
22 |
+
- The 8-bit weight is **not** iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with `dolphinf16`.
|
23 |
+
- All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations.
|
24 |
+
- iMatrix quantization can be applied to all k quantizations, not just the i ones.
|
25 |
+
- 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
|
26 |
+
|
27 |
+
## TODO
|
28 |
+
|
29 |
+
- Upload perplexity benchmarks of each quantization vs f16.
|