incredibly slow

#2
by rebroad - opened

It's possible I'm doing something wrong, but I'm testing this one today, and it is running incredibly slow compared to the other models I'm testing. I have Q8_0 of this, and it's taking around 10 seconds to generate each token, which is 10 to 20 times slower than most other models I'm testing. Anyone have any ideas why?

Have you ever thought about running the q3 k m version or q4 k m, using 33 layers?

Sign up or log in to comment