I've received a report of an issue with with loading these GGUFs

by grimjim - opened Jun 10

Jun 10

This may have something to do with a recent llama.cpp update that made smaug-bpe the pretokenizer for Llama 3 8B models. Reverting to a prior version would be a workaround. I'll be testing the latest version soon in case that also resolves this.

mradermacher

Owner Jun 10

I get lots of reports - all of them by users who use outdated llama.cpp frontends without support for smaug-bpe. I'd say the solution is to get the software updated, as always. In any case, if you need a model to be requanted because of upstream changes, that's no problem, just drop me a note.

grimjim

Jun 10

I'm okay with explaining pretokenizer support to these end users. I'm hoping ooba gets an update with a newer llama.cpp wheel soon. It's been a few weeks since the last release.

mradermacher

Owner Jun 11

Fascinating, koboldpcp is also behind, and normally, I would have expected an update by now. But let's not curse the maintainers of those packages with unrealistic expectations :)

grimjim

Jun 11

Curiouser and curiouser. For some reason the default conversion script in llama.cpp is picking smaug-bpe instead of llama-bpe. Applying the following to a "broken" GGUF gets it to load even on the current ooba with its older llama.cpp wheel. This can be done before quantization is invoked.

python llama.cpp/gguf-py/scripts/gguf-new-metadata.py --pre-tokenizer llama-bpe input_gguf output_gguf

mradermacher

Owner Jun 11

•

edited Jun 11

AFAICR, the llama3-8b model had some fixes, resulting in two different pre-tokenizers, depending on when it was "forked", which is probably what we are seeing. I don't think anything is broken - the pretokenizer is measured from the model, so it's not as if there could be a logic error in the code.

grimjim

Jun 11

I've opened an issue with the upstream llama.cpp project, as the pre-tokenizer sensing appears to originate in a function in the convert-hf-to-gguf.py script. I see llama-bpe in the code as having a distinct hash, but for some reason smaug-bpe is being picked instead. Some kind of pre-tokenizer hash collision seems to be happening.

mradermacher

Owner Jun 11

Very unlikely - it almost certainly choses smaug-bpe because that is the actual pre-tokenizer of the model. If that is really the wrong pre-tokenizer, then I would expect that the model config is at fault. But, yes, the llama.cpp devs will probably find out (unless they ignore the report because the model is unimportant to them, as is the usual case), so opening a bug report is probably the way forward.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment