--- license: cc-by-nc-4.0 --- # FlatDolphinMaid-8x7B 4bpw Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B) You probably want the [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2) version. It just fits in 24gb of vram at half context (16384). If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants. I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup. Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2) Make sure you **enable 8bit cache**. ### Promt format: ``` ### Instruction: {system prompt} ### Input: {input} ### Response: {reply} ``` ### Contact Kooten on discord.