Kooten commited on
Commit
bddb985
1 Parent(s): 18e1afd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -5,7 +5,18 @@ license: cc-by-nc-4.0
5
  # FlatDolphinMaid-8x7B 4bpw
6
  Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)
7
 
8
- 3.5bwp just barely fits in 24gb vram if i lower context (tried with 12288)
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ### Promt format:
11
 
 
5
  # FlatDolphinMaid-8x7B 4bpw
6
  Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)
7
 
8
+ You probably want the [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2) version. It just fits in 24gb of vram at half context (16384).
9
+
10
+ If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.
11
+
12
+ I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.
13
+
14
+
15
+
16
+ Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2)
17
+
18
+ Make sure you **enable 8bit cache**.
19
+
20
 
21
  ### Promt format:
22