nisten
/

qwenv2-7b-inst-imatrix-gguf

Inference Endpoints

Model card Files Files and versions Community

qwenv2-7b-inst-imatrix-gguf

1 contributor

History: 15 commits

nisten's picture

great quant if your chip has 8bit acceleration, slightly better than 4k embedding

0bc4249 verified 5 months ago

.gitattributes

2.79 kB

great quant if your chip has 8bit acceleration, slightly better than 4k embedding 5 months ago
8bitimatrix.dat

4.54 MB
LFS

calculated imatrix in 8bit, was jsut as good as f16 imatrix 5 months ago
README.md

1.55 kB

Update README.md 5 months ago
qwen7bq4kembeddingf16outputf16.gguf

6.11 GB
LFS

Rename qwen7bq4kembeddingbf16outputbf16.gguf to qwen7bq4kembeddingf16outputf16.gguf 5 months ago
qwen7bq4xsoutput6k.gguf

4.22 GB
LFS

Rename qwen7bq4xs.gguf to qwen7bq4xsoutput6k.gguf 5 months ago
qwen7bv2_iq4xs_output8bit.gguf

4.35 GB
LFS

Probably best speed to perplexity ratio of any 7b gguf model so far 5 months ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf

4.64 GB
LFS

great quant if your chip has 8bit acceleration, slightly better than 4k embedding 5 months ago
qwen7bv2inst_q4km_output8bit.gguf

4.82 GB
LFS

very good quant for speed/perplexity, embedding is at q4k 5 months ago
qwen7bv2instruct_bf16.gguf

15.2 GB
LFS

Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf 5 months ago
qwen7bv2instruct_q5km.gguf

5.58 GB
LFS

standard q5km conversions with 8bit output for reference. 5 months ago
qwenv2instruct7b_q8.gguf

8.1 GB
LFS

Good conversion from bf16 down instead of from f16 5 months ago