Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,18 @@ license: cc-by-nc-4.0
|
|
5 |
# FlatDolphinMaid-8x7B 4bpw
|
6 |
Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)
|
7 |
|
8 |
-
3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
### Promt format:
|
11 |
|
|
|
5 |
# FlatDolphinMaid-8x7B 4bpw
|
6 |
Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)
|
7 |
|
8 |
+
You probably want the [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2) version. It just fits in 24gb of vram at half context (16384).
|
9 |
+
|
10 |
+
If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.
|
11 |
+
|
12 |
+
I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
+
Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2)
|
17 |
+
|
18 |
+
Make sure you **enable 8bit cache**.
|
19 |
+
|
20 |
|
21 |
### Promt format:
|
22 |
|