Hardware requirement

#1
by Dtree07 - opened

Anyone knows how big should my VRAM be if I want to run this model?Thx.

Astronomer org

The model weights themselves need around 9 ish GB of VRAM. Depending on what serving framework you are using and your context length for prompt + answer, maybe reserve another 1-2 GB just to be safe. This means that at a minimum, you should serve this using a 12GB VRAM Nvidia card (something like an Nvidia RTX 3060, T4 and etc).

If you have a lower VRAM GPU, perhaps consider our other 4 bit GPTQ quant model here at https://huggingface.co/astronomer-io/Llama-3-8B-Instruct-GPTQ-4-Bit. This should fit in under 8GB VRAM.

Both quants have been tested in transformers, huggingface pipeline, and vLLM. We are running additional testing on HF's text generation inference and text-generation-webui from oobabooga. The performance metrics and sample code used will be posted shortly.

Thank u so much.♥

Sign up or log in to comment