Nexesenex
/

Meta_Llama-3.1-8b-it_iMat_Custom_Quant_Stategies-GGUF

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on Aug 20

Commit

b0aa9e2

•

1 Parent(s): ab2d6cd

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -12,6 +12,9 @@ This is significant enough to encourage you folks to test them, and provide feed
 The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
 a bit of Serbian, and a bit of Croatian languages.
 ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :
@@ -20,9 +23,13 @@ IQ1_XS - Unusable on <30B models
 PR
 1.94 GB (1.93 BPW)
 1.81 GiB (1.93 BPW)
 PPL over 564 chunks for n_ctx=512 = 40.0024 +/- 0.27710
 IQ1_S - Unusable on <30B models
 Master
@@ -35,6 +42,11 @@ PR
 1.91 GiB (2.04 BPW)
 PPL over 564 chunks for n_ctx=512 = 25.2524 +/- 0.17651
 IQ1_M
 Master

 The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
 a bit of Serbian, and a bit of Croatian languages.
+As usual, the name of the quants are a bit pompous,
+because they are numbered on the type of tensor quant mainly used as a base for the FFN.
 ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :
 PR
 1.94 GB (1.93 BPW)
 1.81 GiB (1.93 BPW)
 PPL over 564 chunks for n_ctx=512 = 40.0024 +/- 0.27710
+PR2
+1.98 GB (1.97 BPW)
+1.84 GiB (1.97 BPW)
+PPL over 564 chunks for n_ctx=512 = 33.5198 +/- 0.24187
 IQ1_S - Unusable on <30B models
 Master
 1.91 GiB (2.04 BPW)
 PPL over 564 chunks for n_ctx=512 = 25.2524 +/- 0.17651
+PR2
+2.06 GB (2.05 BPW)
+1.91 GiB (2.05 BPW)
+PPL over 564 chunks for n_ctx=512 = 24.2661 +/- 0.16923
 IQ1_M
 Master