n_ctx strange size

#3
by 010O11 - opened

when I normally use 32K context, it gives me >>> n_ctx 32848 = 6247.16 MiB
but in this model >>> llama_new_context_with_model: total VRAM used: 38378.45 MiB (model: 10055.54 MiB, context: 28322.91 MiB) [Q_6_K TheBloke quant]

Owner

when you normally use 32k context, is that with a 7B mistral-based model?

i believe more parameters --> more memory for the same amount of context. i may be wrong

yeah the 'normally' data are from 7B models, is that huge difference possible? Sry than, I wasn't aware, I thought it's somehow strangely too big.....

...........7B.Q8_0.GGUF n_ctx 32848 = 6247.16 MiB
4x7B.-Q4_K_M.GGUF n_ctx 32848 = 6275.18 MiB

Sign up or log in to comment