120GB of VRAM enough to run this model?
I'm looking at the files and they add up to way more than 120GB.
However, I saw another person in one of the discussions saying you can run it if you have 100GB of VRAM.
Hi there, you should be able to fit full precision in 100GB of VRAM yes, note that there are 2 versions of the weights, the original weights and the transformers implementations weights (consolidated, and model-000...).
100GB of VRAM should fit Mixtral 8x7b at this precision (16bit), and with lower precision and quantization you might be able to run with less VRAM too. I hope this helped!
Hi there, you should be able to fit full precision in 100GB of VRAM yes, note that there are 2 versions of the weights, the original weights and the transformers implementations weights (consolidated, and model-000...).
100GB of VRAM should fit Mixtral 8x7b at this precision (16bit), and with lower precision and quantization you might be able to run with less VRAM too. I hope this helped!
Hello i just wonder if i can run this model on rtx4070 or does card have too little vram in order to be able to run it (it is ok for me to its being too slow)
Thanks for reading and answering my question.
GGUF can run it in ZERO VRAM.
GGUF can run it in ZERO VRAM.
If you are talking about GPT-Generated Unified Format (GGUF) is a file format that streamlines the use and deployment of large language models (LLMs) thanks i thought local models supposed to use all the performance that are available in the computer to run.
Just stating that if you go that route, it can go all the way to zero. The question was 'too little vram to run'. If you choose this method there is no 'too little for it to run'. The more vram you have the better off you are, but there is no lower limit for it to function.
Yes as Nurb states, GGUF allows to run models efficiently on CPU and Mac devices, it is usually slower than with a GPU, but its a fair choice if you do not have a GPU available!
Yes as Nurb states, GGUF allows to run models efficiently on CPU and Mac devices, it is usually slower than with a GPU, but its a fair choice if you do not have a GPU available!
it is possible to use both. So even if your GPU is low, it can still be included. ( its what i do . i have 2 older smaller ones.. 12G each.. i split larger ones across both of those AND CPU. it does help. )
EDIT: im sure you know this, that was mostly for Osman so he knows he can use all he has, and does not have to worry about low vram. ( at least for function. speed, that is another thing )
GGUF allows you to use mainly CPU and if you desire to offload to GPU if you have one!