--- license: apache-2.0 --- # Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations This repository contains iMatrix quantizations of the [dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context. The iMatrix file was generated using the `wiki.train.raw` dataset, which took a few hours to process. We have also included the `wiki.test.raw` file for perplexity testing. ## Quantization Benefits You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model. ## Notes - The 8-bit weight is **not** iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with `dolphinf16`. - All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations. - iMatrix quantization can be applied to all k quantizations, not just the i ones. - 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent ## Perplexity values ```./perplexity -m dolphin2m.gguf -f wiki.test.raw -ngl 34``` ```bash dolphinf16.gguf perplexity - [1]4.3052,[2]4.8421,[3]5.7401,[4]6.6554,[5]6.6552,[6]6.6580,[7]6.9198,[8]7.0918,[9]7.2503,[10]7.5712,[11]7.8367,[12]7.8476, Final estimate: PPL = 7.8476 +/- 0.35984 THIS IS BASELINE dolphin1bit.gguf perplexity - [1]59477.7292,[2]50746.4580,[3]53932.3131,[4]55797.8433,[5]45995.5032,[6]46595.4234,[7]45130.6779,[8]40769.8593,[9]41322.7842,[10]50644.7393,[11]50676.5808,[12]51939.5094, Final estimate: PPL = 51939.5094 +/- 1339.29301 1BIT GIVES GARBAGE OUTPUT dolphin2xxs.gguf perplexity - [1]5.4651,[2]6.7941,[3]7.8700,[4]8.7155,[5]8.3566,[6]8.3316,[7]8.6121,[8]8.7565,[9]8.9041,[10]9.3572,[11]9.6426,[12]9.5626, Final estimate: PPL = 9.5626 +/- 0.43895 9.5 vs 7.8 at f16, means lossy but coherent dolphin2s.gguf perplexity - [1]5.0014,[2]5.9477,[3]6.8424,[4]7.6348,[5]7.4755,[6]7.4667,[7]7.7625,[8]7.8807,[9]8.0374,[10]8.4086,[11]8.6475,[12]8.6427, Final estimate: PPL = 8.6427 +/- 0.39501 dolphin2m.gguf perplexity - [1]4.5874,[2]5.3203,[3]6.2334,[4]7.1444,[5]7.1188,[6]7.1422,[7]7.4717,[8]7.6180,[9]7.7948,[10]8.1319,[11]8.3747,[12]8.4095, Final estimate: PPL = 8.4095 +/- 0.38329 dolphin2k.gguf perplexity - [1]4.6331,[2]5.2648,[3]6.0493,[4]7.0165,[5]6.9300,[6]6.9177,[7]7.2362,[8]7.4417,[9]7.6292,[10]7.9640,[11]8.2121,[12]8.1930, Final estimate: PPL = 8.1930 +/- 0.37241 dolphin2ks.gguf perplexity - [1]4.7995,[2]5.6653,[3]6.4331,[4]7.3841,[5]7.2724,[6]7.3161,[7]7.6567,[8]7.8423,[9]8.0129,[10]8.4033,[11]8.6636,[12]8.6391, Final estimate: PPL = 8.6391 +/- 0.39315 dolphin3s.gguf perplexity - [1]4.3574,[2]4.9936,[3]5.8814,[4]6.8093,[5]6.8086,[6]6.7949,[7]7.0638,[8]7.2204,[9]7.3844,[10]7.6895,[11]7.9489,[12]7.9527, Final estimate: PPL = 7.9527 +/- 0.36202 dolphin3xs.gguf perplexity - [1]4.3161,[2]4.9579,[3]5.8647,[4]6.8064,[5]6.7614,[6]6.7501,[7]7.0133,[8]7.2103,[9]7.3862,[10]7.7265,[11]7.9813,[12]7.9780, Final estimate: PPL = 7.9780 +/- 0.36655 dolphin3xxs.gguf perplexity - [1]4.5418,[2]5.0902,[3]6.0117,[4]6.9852,[5]6.9329,[6]6.9165,[7]7.1853,[8]7.3359,[9]7.4923,[10]7.8122,[11]8.0696,[12]8.0592, Final estimate: PPL = 8.0592 +/- 0.36502 dolphin3m.gguf perplexity - [1]4.3203,[2]4.9566,[3]5.8151,[4]6.7619,[5]6.7801,[6]6.7762,[7]7.0351,[8]7.2054,[9]7.3766,[10]7.6896,[11]7.9580,[12]7.9660, Final estimate: PPL = 7.9660 +/- 0.36234 dolphin4km.gguf perplexity - [1]4.3331,[2]4.9129,[3]5.7915,[4]6.7030,[5]6.6921,[6]6.6978,[7]6.9570,[8]7.1284,[9]7.2854,[10]7.6098,[11]7.8696,[12]7.8767, Final estimate: PPL = 7.8767 +/- 0.35875 dolphin4nl.gguf perplexity - [1]4.2682,[2]4.8494,[3]5.7530,[4]6.6890,[5]6.6672,[6]6.6637,[7]6.9332,[8]7.1126,[9]7.2821,[10]7.5998,[11]7.8733,[12]7.8875, Final estimate: PPL = 7.8875 +/- 0.36227 dolphin4xs.gguf perplexity - [1]4.2986,[2]4.8610,[3]5.7658,[4]6.6906,[5]6.6621,[6]6.6608,[7]6.9321,[8]7.1140,[9]7.2892,[10]7.6085,[11]7.8806,[12]7.8921, Final estimate: PPL = 7.8921 +/- 0.36258 dolphin5ks.gguf perplexity - [1]4.2557,[2]4.8249,[3]5.7413,[4]6.6671,[5]6.6611,[6]6.6686,[7]6.9389,[8]7.1079,[9]7.2707,[10]7.5962,[11]7.8529,[12]7.8627, Final estimate: PPL = 7.8627 +/- 0.36124 dolphin5km.gguf perplexity - [1]4.3191,[2]4.8597,[3]5.7844,[4]6.7120,[5]6.6994,[6]6.6964,[7]6.9569,[8]7.1215,[9]7.2792,[10]7.6109,[11]7.8682,[12]7.8794, Final estimate: PPL = 7.8794 +/- 0.36185 dolphin6k.gguf perplexity - [1]4.3264,[2]4.8531,[3]5.7574,[4]6.6741,[5]6.6707,[6]6.6795,[7]6.9362,[8]7.1076,[9]7.2678,[10]7.5864,[11]7.8496,[12]7.8628, Final estimate: PPL = 7.8628 +/- 0.36075 dolphin8bit.gguf perplxity - [1]4.3063,[2]4.8463,[3]5.7347,[4]6.6499,[5]6.6471,[6]6.6531,[7]6.9160,[8]7.0899,[9]7.2509,[10]7.5705,[11]7.8357,[12]7.8466, Final estimate: PPL = 7.8466 +/- 0.35948 ``` As we can see 2bit xxs with this method actually is surprisingly coherent.