standard iq4xs imatrix quant from bf16 gguf so it has better perplexity 8d80a92 verified nisten commited on Jun 16
best speed/perplexity quant for mobile devices with 8bit acceleration d2b704a verified nisten commited on Jun 16
Good speed reference quant for older CPUs, however not much improvement from f16 embedding dac48df verified nisten commited on Jun 16
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants bc0fa51 verified nisten commited on Jun 16
Standard IQ4XS quantizing down from full bf16 ( not from f16) bb46d3a verified nisten commited on Jun 16
great quant if your chip has 8bit acceleration, slightly better than 4k embedding 0bc4249 verified nisten commited on Jun 16
Rename qwen7bq4xsembedding8output8.gguf to qwen7bv2inst_iq4xs_embedding8_output8.gguf 9b91d66 verified nisten commited on Jun 16
Probably best speed to perplexity ratio of any 7b gguf model so far 0e76852 verified nisten commited on Jun 16
calculated imatrix in 8bit, was jsut as good as f16 imatrix b7097b6 verified nisten commited on Jun 16
Rename qwen7bq4xsembedding5bitkoutput8bit.gguf to qwen7bq4xsembedding8output8.gguf ee4c789 verified nisten commited on Jun 16
Rename qwen7bq4kembeddingbf16outputbf16.gguf to qwen7bq4kembeddingf16outputf16.gguf d9150dc verified nisten commited on Jun 16