Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
nisten
/
qwenv2-7b-inst-imatrix-gguf
like
3
GGUF
Inference Endpoints
License:
apache-2.0
Model card
Files
Files and versions
Community
Deploy
Use this model
bc0fa51
qwenv2-7b-inst-imatrix-gguf
1 contributor
History:
17 commits
nisten
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
bc0fa51
verified
4 months ago
.gitattributes
2.92 kB
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
4 months ago
8bitimatrix.dat
4.54 MB
LFS
calculated imatrix in 8bit, was jsut as good as f16 imatrix
4 months ago
README.md
1.55 kB
Update README.md
4 months ago
qwen7bq4kembeddingf16outputf16.gguf
6.11 GB
LFS
Rename qwen7bq4kembeddingbf16outputbf16.gguf to qwen7bq4kembeddingf16outputf16.gguf
4 months ago
qwen7bv2_iq4xs_output8bit.gguf
4.35 GB
LFS
Probably best speed to perplexity ratio of any 7b gguf model so far
4 months ago
qwen7bv2inst_Iq4xs_output6k.gguf
4.22 GB
LFS
Standard IQ4XS quantizing down from full bf16 ( not from f16)
4 months ago
qwen7bv2inst_iq4xs_embedding8_outputq8.gguf
4.64 GB
LFS
great quant if your chip has 8bit acceleration, slightly better than 4k embedding
4 months ago
qwen7bv2inst_q4km_output8bit.gguf
4.82 GB
LFS
very good quant for speed/perplexity, embedding is at q4k
4 months ago
qwen7bv2instruct_bf16.gguf
15.2 GB
LFS
Rename qwen7bf16.gguf to qwen7bv2instruct_bf16.gguf
4 months ago
qwen7bv2instruct_q5km.gguf
5.58 GB
LFS
standard q5km conversions with 8bit output for reference.
4 months ago
qwen7bv2instruct_q8.gguf
8.1 GB
LFS
Best q8 conversion down from bf16 with slightly better perplexity than f16 based quants
4 months ago