BlackSamorez's picture
Update README.md
dad6104 verified
|
raw
history blame
No virus
1.8 kB

Official AQLM quantization of meta-llama/Llama-2-13b-hf.

For this quantization, we used 1 codebook of 16 bits.

Selected evaluation results for this and other models:

Model AQLM scheme WikiText 2 PPL Model size, Gb Hub link
Llama-2-7b† 1x16 5.92 2.4 Link
Llama-2-7b† 2x8 6.69 2.2 Link
Llama-2-7b† 8x8 6.61 2.2 Link
Llama-2-13b (THIS) 1x16 5.41 4.1 Link
Llama-2-70b 1x16 3.96 18.8 Link
Llama-2-70b 2x8 4.83 18.2 Link
Mixtral-8x7b 1x16 4.37 12.6 Link
Mixtral-8x7b-Instruct 1x16 - 12.6 Link

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.