ARM quants
#1
by
EloyOn
- opened
Will you considerer adding i8mm (q4_0_4_8) quants, like Bartowski is doing to all the new models he quants?
With those, you can run a 12B on a 16GB RAM smartphone at around 5 t/s.
Thank you for your quants.
I will try to upload these as well if I can. Generally upon request I get a better taste of is going to be more useful for people.
Update:
Uploading the ARM friendly quants!
Thank you, you are a hero. I'm sure that users who run AI's on their smartphones will appreciate it.