Kooten
/

FlatDolphinMaid-8x7B-4bpw-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

FlatDolphinMaid-8x7B-4bpw-exl2 / README.md

Kooten's picture

Update README.md

bddb985 8 months ago

|

history blame contribute delete

No virus

1.05 kB

	---
	license: cc-by-nc-4.0
	---

	# FlatDolphinMaid-8x7B 4bpw
	Exllama quant of [Undi95/FlatDolphinMaid-8x7B](https://huggingface.co/Undi95/FlatDolphinMaid-8x7B)

	You probably want the [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2) version. It just fits in 24gb of vram at half context (16384).

	If you really want the larger context [3bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2) should do it but you are probably better of with the gguf version with higher quants.

	I did make a [4bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2), it might work in a headless or multigpu setup.



	Other BPW's [3.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3bpw-exl2), [3.5bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-3.5bpw-exl2), [4.0bpw](https://huggingface.co/Kooten/FlatDolphinMaid-8x7B-4bpw-exl2)

	Make sure you enable 8bit cache.


	### Promt format:

	```
	### Instruction:
	{system prompt}

	### Input:
	{input}

	### Response:
	{reply}
	```

	### Contact
	Kooten on discord.