How to use this?
#1
by
steampunk333
- opened
I'm familiar with gguf by itself, but how do I use this imatrix with the assembled file? Do I even have to? What does it do?
You just load it with a newer l.cpp
Don't think Q3 will fit much context though, sadly. At least not without flash attention like exl2.
I was more referring to Q_8(running in DRAM)
Also, what's l.cpp?
llama.cpp