File size: 5,210 Bytes
3638730
 
 
13c5efc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b20a553
13c5efc
b20a553
0f5691d
dc05bf9
 
b20a553
0f5691d
b20a553
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: apache-2.0
---


# Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations

This repository contains iMatrix quantizations of the [dolphin-2.8-mistral-7b-v02](https://huggingface.co/cognitivecomputations/dolphin-2.8-mistral-7b-v02) model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context.

The iMatrix file was generated using the `wiki.train.raw` dataset, which took a few hours to process. We have also included the `wiki.test.raw` file for perplexity testing.

## Quantization Benefits

You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model.

## Notes

- The 8-bit weight is **not** iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with `dolphinf16`.
- All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations.
- iMatrix quantization can be applied to all k quantizations, not just the i ones.
- 1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent

## Perplexity values

```./perplexity -m dolphin2m.gguf -f wiki.test.raw -ngl 34```

```bash
dolphinf16.gguf perplexity - [1]4.3052,[2]4.8421,[3]5.7401,[4]6.6554,[5]6.6552,[6]6.6580,[7]6.9198,[8]7.0918,[9]7.2503,[10]7.5712,[11]7.8367,[12]7.8476,
Final estimate: PPL = 7.8476 +/- 0.35984    THIS IS BASELINE 

dolphin1bit.gguf perplexity - [1]59477.7292,[2]50746.4580,[3]53932.3131,[4]55797.8433,[5]45995.5032,[6]46595.4234,[7]45130.6779,[8]40769.8593,[9]41322.7842,[10]50644.7393,[11]50676.5808,[12]51939.5094,
Final estimate: PPL = 51939.5094 +/- 1339.29301     1BIT GIVES GARBAGE OUTPUT

dolphin2xxs.gguf perplexity - [1]5.4651,[2]6.7941,[3]7.8700,[4]8.7155,[5]8.3566,[6]8.3316,[7]8.6121,[8]8.7565,[9]8.9041,[10]9.3572,[11]9.6426,[12]9.5626,
Final estimate: PPL = 9.5626 +/- 0.43895    9.5 vs 7.8 at f16, means lossy but coherent

dolphin2s.gguf perplexity - [1]5.0014,[2]5.9477,[3]6.8424,[4]7.6348,[5]7.4755,[6]7.4667,[7]7.7625,[8]7.8807,[9]8.0374,[10]8.4086,[11]8.6475,[12]8.6427,
Final estimate: PPL = 8.6427 +/- 0.39501

dolphin2m.gguf perplexity - [1]4.5874,[2]5.3203,[3]6.2334,[4]7.1444,[5]7.1188,[6]7.1422,[7]7.4717,[8]7.6180,[9]7.7948,[10]8.1319,[11]8.3747,[12]8.4095,
Final estimate: PPL = 8.4095 +/- 0.38329

dolphin2k.gguf perplexity - [1]4.6331,[2]5.2648,[3]6.0493,[4]7.0165,[5]6.9300,[6]6.9177,[7]7.2362,[8]7.4417,[9]7.6292,[10]7.9640,[11]8.2121,[12]8.1930,
Final estimate: PPL = 8.1930 +/- 0.37241

dolphin2ks.gguf perplexity - [1]4.7995,[2]5.6653,[3]6.4331,[4]7.3841,[5]7.2724,[6]7.3161,[7]7.6567,[8]7.8423,[9]8.0129,[10]8.4033,[11]8.6636,[12]8.6391,
Final estimate: PPL = 8.6391 +/- 0.39315

dolphin3s.gguf perplexity - [1]4.3574,[2]4.9936,[3]5.8814,[4]6.8093,[5]6.8086,[6]6.7949,[7]7.0638,[8]7.2204,[9]7.3844,[10]7.6895,[11]7.9489,[12]7.9527,
Final estimate: PPL = 7.9527 +/- 0.36202

dolphin3xs.gguf perplexity - [1]4.3161,[2]4.9579,[3]5.8647,[4]6.8064,[5]6.7614,[6]6.7501,[7]7.0133,[8]7.2103,[9]7.3862,[10]7.7265,[11]7.9813,[12]7.9780,
Final estimate: PPL = 7.9780 +/- 0.36655

dolphin3xxs.gguf perplexity - [1]4.5418,[2]5.0902,[3]6.0117,[4]6.9852,[5]6.9329,[6]6.9165,[7]7.1853,[8]7.3359,[9]7.4923,[10]7.8122,[11]8.0696,[12]8.0592,
Final estimate: PPL = 8.0592 +/- 0.36502

dolphin3m.gguf perplexity  - [1]4.3203,[2]4.9566,[3]5.8151,[4]6.7619,[5]6.7801,[6]6.7762,[7]7.0351,[8]7.2054,[9]7.3766,[10]7.6896,[11]7.9580,[12]7.9660,
Final estimate: PPL = 7.9660 +/- 0.36234

dolphin4km.gguf perplexity - [1]4.3331,[2]4.9129,[3]5.7915,[4]6.7030,[5]6.6921,[6]6.6978,[7]6.9570,[8]7.1284,[9]7.2854,[10]7.6098,[11]7.8696,[12]7.8767,
Final estimate: PPL = 7.8767 +/- 0.35875

dolphin4nl.gguf perplexity - [1]4.2682,[2]4.8494,[3]5.7530,[4]6.6890,[5]6.6672,[6]6.6637,[7]6.9332,[8]7.1126,[9]7.2821,[10]7.5998,[11]7.8733,[12]7.8875,
Final estimate: PPL = 7.8875 +/- 0.36227

dolphin4xs.gguf perplexity - [1]4.2986,[2]4.8610,[3]5.7658,[4]6.6906,[5]6.6621,[6]6.6608,[7]6.9321,[8]7.1140,[9]7.2892,[10]7.6085,[11]7.8806,[12]7.8921,
Final estimate: PPL = 7.8921 +/- 0.36258

dolphin5ks.gguf perplexity - [1]4.2557,[2]4.8249,[3]5.7413,[4]6.6671,[5]6.6611,[6]6.6686,[7]6.9389,[8]7.1079,[9]7.2707,[10]7.5962,[11]7.8529,[12]7.8627,
Final estimate: PPL = 7.8627 +/- 0.36124

dolphin5km.gguf perplexity - [1]4.3191,[2]4.8597,[3]5.7844,[4]6.7120,[5]6.6994,[6]6.6964,[7]6.9569,[8]7.1215,[9]7.2792,[10]7.6109,[11]7.8682,[12]7.8794,
Final estimate: PPL = 7.8794 +/- 0.36185

dolphin6k.gguf perplexity - [1]4.3264,[2]4.8531,[3]5.7574,[4]6.6741,[5]6.6707,[6]6.6795,[7]6.9362,[8]7.1076,[9]7.2678,[10]7.5864,[11]7.8496,[12]7.8628,
Final estimate: PPL = 7.8628 +/- 0.36075

dolphin8bit.gguf perplxity - [1]4.3063,[2]4.8463,[3]5.7347,[4]6.6499,[5]6.6471,[6]6.6531,[7]6.9160,[8]7.0899,[9]7.2509,[10]7.5705,[11]7.8357,[12]7.8466,
Final estimate: PPL = 7.8466 +/- 0.35948
```


As we can see 2bit xxs with this method actually is surprisingly coherent.