Commit History

standard iq4xs imatrix quant from bf16 gguf so it has better perplexity
8d80a92
verified

nisten commited on

best speed/perplexity quant for mobile devices with 8bit acceleration
d2b704a
verified

nisten commited on

Good speed reference quant for older CPUs, however not much improvement from f16 embedding
dac48df
verified

nisten commited on

Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
bc0fa51
verified

nisten commited on

Standard IQ4XS quantizing down from full bf16 ( not from f16)
bb46d3a
verified

nisten commited on

great quant if your chip has 8bit acceleration, slightly better than 4k embedding
0bc4249
verified

nisten commited on

Rename qwen7bq4xsembedding8output8.gguf to qwen7bv2inst_iq4xs_embedding8_output8.gguf
9b91d66
verified

nisten commited on

Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf
9cbd6f2
verified

nisten commited on

very good quant for speed/perplexity, embedding is at q4k
6c5e613
verified

nisten commited on

Probably best speed to perplexity ratio of any 7b gguf model so far
0e76852
verified

nisten commited on

standard q5km conversions with 8bit output for reference.
6da7eb9
verified

nisten commited on

Good conversion from bf16 down instead of from f16
957d5fb
verified

nisten commited on

calculated imatrix in 8bit, was jsut as good as f16 imatrix
b7097b6
verified

nisten commited on

Rename qwen7bq4xs.gguf to qwen7bq4xsoutput6k.gguf
6e41799
verified

nisten commited on

Rename qwen7bq4xsembedding5bitkoutput8bit.gguf to qwen7bq4xsembedding8output8.gguf
ee4c789
verified

nisten commited on

Rename qwen7bq4kembeddingbf16outputbf16.gguf to qwen7bq4kembeddingf16outputf16.gguf
d9150dc
verified

nisten commited on

Upload 9 files
49deabb
verified

nisten commited on

Update README.md
081ce53
verified

nisten commited on

Update README.md
357d311
verified

nisten commited on

initial commit
2909ebc
verified

nisten commited on