GGML models that can run f16 41.68 ms per token and q8 23.76 ms per token giving good results 56d7c99 Kabumbus commited on Sep 11, 2023