I've received a report of an issue with with loading these GGUFs

#1
by grimjim - opened

This may have something to do with a recent llama.cpp update that made smaug-bpe the pretokenizer for Llama 3 8B models. Reverting to a prior version would be a workaround. I'll be testing the latest version soon in case that also resolves this.

I get lots of reports - all of them by users who use outdated llama.cpp frontends without support for smaug-bpe. I'd say the solution is to get the software updated, as always. In any case, if you need a model to be requanted because of upstream changes, that's no problem, just drop me a note.

I'm okay with explaining pretokenizer support to these end users. I'm hoping ooba gets an update with a newer llama.cpp wheel soon. It's been a few weeks since the last release.

Fascinating, koboldpcp is also behind, and normally, I would have expected an update by now. But let's not curse the maintainers of those packages with unrealistic expectations :)

Curiouser and curiouser. For some reason the default conversion script in llama.cpp is picking smaug-bpe instead of llama-bpe. Applying the following to a "broken" GGUF gets it to load even on the current ooba with its older llama.cpp wheel. This can be done before quantization is invoked.

python llama.cpp/gguf-py/scripts/gguf-new-metadata.py --pre-tokenizer llama-bpe input_gguf output_gguf

AFAICR, the llama3-8b model had some fixes, resulting in two different pre-tokenizers, depending on when it was "forked", which is probably what we are seeing. I don't think anything is broken - the pretokenizer is measured from the model, so it's not as if there could be a logic error in the code.

I've opened an issue with the upstream llama.cpp project, as the pre-tokenizer sensing appears to originate in a function in the convert-hf-to-gguf.py script. I see llama-bpe in the code as having a distinct hash, but for some reason smaug-bpe is being picked instead. Some kind of pre-tokenizer hash collision seems to be happening.

Very unlikely - it almost certainly choses smaug-bpe because that is the actual pre-tokenizer of the model. If that is really the wrong pre-tokenizer, then I would expect that the model config is at fault. But, yes, the llama.cpp devs will probably find out (unless they ignore the report because the model is unimportant to them, as is the usual case), so opening a bug report is probably the way forward.

Sign up or log in to comment