Gemma-2-2B-it-4Bit-GPTQ

Quantization

This model was quantized with the Auto-GPTQ library and dataset containing english and russian wikipedia articles. It has lower perplexity on russian data then other GPTQ models.

Model	bits	Perplexity (russian wiki)
gemma-2-9b-it	16bit	6.2152
Granther/Gemma-2-9B-Instruct-4Bit-GPTQ	4bit	6.4966
this model	4bit	6.3593

Safetensors

Model size

2.03B params

Tensor type

I32

FP16

Inference Examples

Inference API (serverless) has been turned off for this model.

Base model

Finetuned

Quantized

(117)

this model