MarsupialAI
/

Llama3_GGUF_Quant_Testing

Inference Endpoints

Model card Files Files and versions Community

MarsupialAI commited on Apr 25

Commit

a383725

•

1 Parent(s): c3cd2ef

Create README.md

Files changed (1) hide show

README.md +21 -0

README.md ADDED Viewed

	@@ -0,0 +1,21 @@

+Some folks are claiming there's something funky going on with GGUF quanting for Llama 3 models.  I don't disagree.
+Some of those people are speculating that it has something to do with converting the raw weights from bf16 to fp16 instead
+of converting to fp32 as an intermediate step.  I think that's bollocks.  There is no logical or mathmatical justification for
+how that could possibly matter.
+So to test this crazy theory, I downloaded Undi95/Meta-Llama-3-8B-Instruct-hf and converted it to GGUF three ways:
+- fp16 specifically with `--outtype f16`
+- fp32 specifically with `--outtype f32`
+- "Auto" with no outtype specified
+I then quantized each of these conversions to Q4_K_M and ran perplexity tests on everything using my abbreviated wiki.short.raw
+The results:
+As you can see, converting to fp32 has no meaningful effect on PPL.  There will no doubt be some people who will claim
+"PpL iSn'T gOoD eNoUgH!!1!".  For those people, I have uploaded all GGUFs used in this test.  Feel free to do more extensive
+testing on your own time.  I consider the matter resolved until somebody can conclusively demonstrate otherwise.