Gemma 2 2B quantized for wllama (under 2gb).
q4_0_4_8 is WAY faster when using llama.cpp, with wllama, it's about the same as q4_k.
- Downloads last month
- 38
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.