Don't need 200k?

#1
by bdambrosio - opened

I would love to run this model in 8bit, and I don't need more than 16k 'real' context.
but exllamav2 says not enough memory (2x4090 = 48GB).
Is there a mod I can make to config.json, or to exllamav2 config, that would allow me to load this?
6 bit, or even 6.5, seems to have lots of vram left over, so not sure why 8 bit won't load.
Any ideas? Tnx!

Just set your context in ooba to reduce the max tokens. Or you can edit cnofig.yaml and change this line:
"max_position_embeddings": 200000,

I don't think there is much difference in quality going to 6.0bpw. But, if you don't need full context, 8.0bpw and dropping max tokens should be possible.

Sign up or log in to comment