nisten
/

qwenv2-7b-inst-imatrix-gguf

Model card Files Files and versions Community

qwenv2-7b-inst-imatrix-gguf

1 contributor

History: 19 commits

nisten's picture

best speed/perplexity quant for mobile devices with 8bit acceleration

d2b704a verified 24 days ago

.gitattributes

3.07 kB

best speed/perplexity quant for mobile devices with 8bit acceleration 24 days ago
8bitimatrix.dat

4.54 MB
LFS

calculated imatrix in 8bit, was jsut as good as f16 imatrix 24 days ago
README.md

1.55 kB

Update README.md 24 days ago
qwen7bv2inst_Iq4xs_output6k.gguf

4.22 GB
LFS

Standard IQ4XS quantizing down from full bf16 ( not from f16) 24 days ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf
4.64 GB
LFS

great quant if your chip has 8bit acceleration, slightly better than 4k embedding 24 days ago
qwen7bv2inst_iq4xs_output8bit.gguf
4.35 GB
LFS

best speed/perplexity quant for mobile devices with 8bit acceleration 24 days ago
qwen7bv2inst_q4km_embeddingf16_outputf16.gguf
6.11 GB
LFS

Good speed reference quant for older CPUs, however not much improvement from f16 embedding 24 days ago
qwen7bv2inst_q4km_output8bit.gguf
4.82 GB
LFS

very good quant for speed/perplexity, embedding is at q4k 24 days ago
qwen7bv2instruct_bf16.gguf
15.2 GB
LFS

Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf 24 days ago
qwen7bv2instruct_q5km.gguf
5.58 GB
LFS

standard q5km conversions with 8bit output for reference. 24 days ago
qwen7bv2instruct_q8.gguf
8.1 GB
LFS

Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants 24 days ago