llama.cpp tokenization bug
This GGUF and other's derived from llama3 models is probably affected by
https://github.com/ggerganov/llama.cpp/pull/6920
Can you recreate your quantization with the fixed commit?
If you use llamacpp you can try this
About this:
I'll redo them as per demand, at least the most popular and well performing models, and will add a Notice to the ones that still need to be updated. KoboldCpp has to get upstream features for its users to be able fo actually benefit from the fixes and there's still a potential issue to be solved:
Can you recreate your quantization with the fixed commit?
@FlareRebellion - Will do these quants again and reupload. You'll still have to wait for KCPP 1.64 release to get the benefits but quants will at least already be ready.
Issues seem to getting fixed already, using latest llamacpp:
https://github.com/ggerganov/llama.cpp/issues/6914#issuecomment-2084315900
Facing issues with Aurora's tokenizer... Will wait some more to look into it, might be another issue.
@FlareRebellion For now I'll recommend you check out https://huggingface.co/Lewdiculous/Chaos_RP_l3_8B-GGUF-IQ-Imatrix, which should be as good as Aurora or better and I was able to re-quant it properly. I'll talk to the author about Aurora.