Fishfishfishfishfish
/

Gemma-2-2B_wllama_gguf

Inference Endpoints

Model card Files Files and versions Community

Gemma-2-2B_wllama_gguf / README.md

Fishfishfishfishfish's picture

Fishfishfishfishfish

Update README.md

f3e69d1 verified 5 months ago

|

history blame contribute delete

204 Bytes

	---
	license: gemma
	language:
	- en
	base_model: google/gemma-2-2b-it
	---
	Gemma 2 2B quantized for wllama (under 2gb).

	q4_0_4_8 is WAY faster when using llama.cpp, with wllama, it's about the same as q4_k.