Description

Exllama 2 quant of NeverSleep/Echidna-13b-v0.3

4 BPW, Head bit set to 8

VRAM

My VRAM usage with 13B models are:

Bits per weight Context VRAM
8bpw 8k 22gb
8bpw 4k 19gb
6bpw 8k 19gb
6bpw 4k 16gb
4bpw 8k 16gb
4bpw 4k 13gb
3bpw 8k 15gb
3bpw 4k 12gb
I have rounded up, these arent exact numbers, this is also on a windows machine, they should be slightly lower on linux.

Prompt template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Kooten/Echidna-13b-v0.3-4bpw-h8-exl2