Awesome model !!!

#8
by Pb-207 - opened

This model is the only one pass boiling-water-obtuse-angle test among tested open-source models. Even better than falcon-40b
Text generation web UI - Google Chrome 2023_6_3 2(1)__01.png
And can run on single 4090 with satisfying speed (about 8 token/s on win11)

Excellent! Thanks for reporting. That's a good test!

May i know memory requirement to run this model on GPU.
i assume min 17GB video memory.
currently i'm on 32GB Ram with GTX 1080ti (11GB Vid Memory)
when i turn to load this model on text-generation-webui, it fills ram (bit of mem swapping) then it crashes.
may i know how much RAM is required to run this and is it possible to run only on CPU mode (selecting cpu under model settings in text-generatino-webui didn't help)

following model works fine on GPU
TheBloke_stable-vicuna-13B-GPTQ
Getting about 8tokens/sec
tried same question as in this thread

image.png

Thanks

Got different result with a littel bit wording change, :)
image.png

May i know memory requirement to run this model on GPU.
i assume min 17GB video memory.
currently i'm on 32GB Ram with GTX 1080ti (11GB Vid Memory)
when i turn to load this model on text-generation-webui, it fills ram (bit of mem swapping) then it crashes.
may i know how much RAM is required to run this and is it possible to run only on CPU mode (selecting cpu under model settings in text-generatino-webui didn't help)

following model works fine on GPU
TheBloke_stable-vicuna-13B-GPTQ
Getting about 8tokens/sec
tried same question as in this thread

image.png

Thanks

You need to choose ggml version to run on cpu, GPTQ is only for GPU. This model requires at least 18G to load, and the usage of vram will increase to 21G after several chats, so i suggest using GPU with at least 24G VRAM.
You need to confine the usage of VRAM and leave some VRAM for chat to get rid of "CUDA: OUT OF MEMEROY". In oob-webui that's --gpu-memory 8 (8 is an example) This will decrease the generating speed, but decrease the demand of VRAM.

Sign up or log in to comment