ARM quants

#1
by EloyOn - opened

Will you considerer adding i8mm (q4_0_4_8) quants, like Bartowski is doing to all the new models he quants?

With those, you can run a 12B on a 16GB RAM smartphone at around 5 t/s.

Thank you for your quants.

I will try to upload these as well if I can. Generally upon request I get a better taste of is going to be more useful for people.

Update:
Uploading the ARM friendly quants!

Thank you, you are a hero. I'm sure that users who run AI's on their smartphones will appreciate it.

Sign up or log in to comment