converted gguf format model is so slow on inference( is that right?)

#1
by bangbang - opened

I use KoLLaVA-Synatra-7b by converting gguf format. that gguf model so slow... that i thought i coludn't use this. (못쓸정도로 느립니다.)

I want you to tell me this model slow is true????

How did you quantize it? like Q8_0, Q 4_K_M

Sign up or log in to comment