Update README.md

71cb383 verified 4 months ago

No virus

664 Bytes

Official AQLM quantization of google/gemma-2b.

For this quantization, we used 2 codebooks of 8 bits.

Results (0-shot acc):

Model	Quantization	WinoGrande	PiQA	HellaSwag	ArcE	ArcC	Model size, Gb
gemma-2b	None	0.6472	0.7715	0.5279	0.7403	0.4053	5.0
	2x8	0.5801	0.6828	0.3891	0.5791	0.2534	1.6

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.