InferenceIllusionist
/

Meta-Llama-3.1-8B-Claude-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on Jul 24, 2024

Commit

cd78bf4

•

1 Parent(s): 1217063

Create README.md

Files changed (1) hide show

README.md +26 -0

README.md ADDED Viewed

	@@ -0,0 +1,26 @@

+---
+base_model: Undi95/Meta-Llama-3.1-8B-Claude
+library_name: transformers
+quantized_by: InferenceIllusionist
+tags:
+- iMat
+- gguf
+- llama3
+license: apache-2.0
+---
+<img src="https://i.imgur.com/P68dXux.png" width="400"/>
+# Meta-Llama-3.1-8B-Claude-iMat-GGUF
+Quantized from Meta-Llama-3.1-8B-Claude fp16
+* Weighted quantizations were creating using fp16 GGUF and groups_merged.txt in 88 chunks and n_ctx=512
+* Static fp16 will also be included in repo
+* For a brief rundown of iMatrix quant performance please see this [PR](https://github.com/ggerganov/llama.cpp/pull/5747)
+* <i>All quants are verified working prior to uploading to repo for your safety and convenience</i>
+<b>KL-Divergence Reference Chart</b>
+ (Click on image to view in full size)
+[<img src="https://i.imgur.com/mV0nYdA.png" width="920"/>](https://i.imgur.com/mV0nYdA.png)
+Original model card can be found [here](https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude)