Panchovix
/

airoboros-l2-70b-gpt4-1.4.1_5.0bpw-h6-exl2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Panchovix commited on Sep 21, 2023

Commit

a55e484

•

1 Parent(s): eff1057

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -3,8 +3,8 @@ license: other
 ---
 5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.
-On 2x4090, 3072 ctx seems to work fine with 21.5,22.5 gpu_split and max_attention_size = 1024 ** 2 instead if 2048 ** 2.
-4096 may be factible on a single 48GB VRAM GPU (like A6000)
-Tests are welcome.

 ---
 5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.
+Update 21/09/23
+Re-quanted with latest exllamav2 version, which fixed some measurement issues.
+Also, now 5bpw works on 2x24GB VRAM cards, using gpu_split 21,21 and flash-attn (only Linux for now), for 4096 context and 1 GB to spare, to try for more.