nisten
/

dolphin-2.8-7b-imatrix-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Apr 5

Commit

13c5efc

•

1 Parent(s): 2f24f9e

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+# Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations
+This repository contains iMatrix quantizations of the [dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context.
+The iMatrix file was generated using the `wiki.train.raw` dataset, which took a few hours to process. We have also included the `wiki.test.raw` file for perplexity testing.
+## Quantization Benefits
+You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model.
+## TODO: Benchmarks
+1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
+## Notes
+- The 8-bit weight is **not** iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with `dolphinf16`.
+- All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations.
+- iMatrix quantization can be applied to all k quantizations, not just the i ones.
+- 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent
+## TODO
+- Upload perplexity benchmarks of each quantization vs f16.