Request to create one Q8_0 version with --leave-output-tensor
#1
by
mechanicmuthu
- opened
Hi, I have a request. If you can upload a Q8_0 version with LOT option (--leave-output-tensor, the highest quality Quantized version) so that we can do the
./quantize --allow-requantize model-q8_0-LOT.gguf Q4_0 or any other type ourselves without loss of quality (I guess).
For me, on the one hand, downloading the F16 / F32 .pth version and converting to gguf is too big, and, on the other, I want to try out multiple quantized versions so that I can test out their speeds and quality WITHOUT downloading multiple files large files.
You can provide the quantize script in readme. Just suggesting. Thanks.