ISTA-DASLab
/

gemma-2b-AQLM-2Bit-1x16-hf

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SpiridonSunRotator commited on Mar 6

Commit

c71a514

•

1 Parent(s): 962a874

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,6 +6,6 @@ For this quantization, we used 1 codebook of 16 bits.
 Results:
 | Model      | AQLM scheme | WinoGrande | PiQA | HellaSwag | ArcE | ArcC | Model size, Gb |
 |------|------|------|-------|-------|-------|------|------|
-| gemma-2b |2x8| 0.6275	| 0.7318	| 0.4582	| 0.6923	| 0.3259| 1.7 |
 To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).

 Results:
 | Model      | AQLM scheme | WinoGrande | PiQA | HellaSwag | ArcE | ArcC | Model size, Gb |
 |------|------|------|-------|-------|-------|------|------|
+| gemma-2b |1x16| 0.6275	| 0.7318	| 0.4582	| 0.6923	| 0.3259| 1.7 |
 To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).