any plan to release quantization that works with llama.cpp
#3
by
ziyadalkhonein
- opened
any plan to release quantization that works with llama.cpp? you know not lot of people have V100 or A100
You can run it on oobabooga in 4bit which will take less vram