120GB of VRAM enough to run this model?

#237

by AIGUYCONTENT - opened Aug 4, 2024

Discussion

AIGUYCONTENT

Aug 4, 2024

I'm looking at the files and they add up to way more than 120GB.

However, I saw another person in one of the discussions saying you can run it if you have 100GB of VRAM.

pandora-s

Mistral AI_ org Aug 5, 2024

Hi there, you should be able to fit full precision in 100GB of VRAM yes, note that there are 2 versions of the weights, the original weights and the transformers implementations weights (consolidated, and model-000...).
100GB of VRAM should fit Mixtral 8x7b at this precision (16bit), and with lower precision and quantization you might be able to run with less VRAM too. I hope this helped!

OsmanAlcar

Sep 16, 2024

Hi there, you should be able to fit full precision in 100GB of VRAM yes, note that there are 2 versions of the weights, the original weights and the transformers implementations weights (consolidated, and model-000...).
100GB of VRAM should fit Mixtral 8x7b at this precision (16bit), and with lower precision and quantization you might be able to run with less VRAM too. I hope this helped!

Hello i just wonder if i can run this model on rtx4070 or does card have too little vram in order to be able to run it (it is ok for me to its being too slow)
Thanks for reading and answering my question.

deleted

Sep 16, 2024

GGUF can run it in ZERO VRAM.

OsmanAlcar

Sep 16, 2024

GGUF can run it in ZERO VRAM.

If you are talking about GPT-Generated Unified Format (GGUF) is a file format that streamlines the use and deployment of large language models (LLMs) thanks i thought local models supposed to use all the performance that are available in the computer to run.

deleted

Sep 16, 2024

Just stating that if you go that route, it can go all the way to zero. The question was 'too little vram to run'. If you choose this method there is no 'too little for it to run'. The more vram you have the better off you are, but there is no lower limit for it to function.

pandora-s

Mistral AI_ org Sep 17, 2024

Yes as Nurb states, GGUF allows to run models efficiently on CPU and Mac devices, it is usually slower than with a GPU, but its a fair choice if you do not have a GPU available!

deleted

Sep 17, 2024

•

edited Sep 17, 2024

Yes as Nurb states, GGUF allows to run models efficiently on CPU and Mac devices, it is usually slower than with a GPU, but its a fair choice if you do not have a GPU available!

it is possible to use both. So even if your GPU is low, it can still be included. ( its what i do . i have 2 older smaller ones.. 12G each.. i split larger ones across both of those AND CPU. it does help. )

EDIT: im sure you know this, that was mostly for Osman so he knows he can use all he has, and does not have to worry about low vram. ( at least for function. speed, that is another thing )

pandora-s

Mistral AI_ org Sep 17, 2024

GGUF allows you to use mainly CPU and if you desire to offload to GPU if you have one!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment