TheBloke/Llama-2-70B-GGUF · Request to create one Q8_0 version with --leave-output-tensor

Hi, I have a request. If you can upload a Q8_0 version with LOT option (--leave-output-tensor, the highest quality Quantized version) so that we can do the
./quantize --allow-requantize model-q8_0-LOT.gguf Q4_0 or any other type ourselves without loss of quality (I guess).

For me, on the one hand, downloading the F16 / F32 .pth version and converting to gguf is too big, and, on the other, I want to try out multiple quantized versions so that I can test out their speeds and quality WITHOUT downloading multiple files large files.

You can provide the quantize script in readme. Just suggesting. Thanks.