Reconverted and requantized with latest GGUF to fix llama3 tokenizer

@algorithm The issue is that it will probably affect it even with CPU, but to a much lesser degree due to bfloat16->float16 conversion, I've noticed that it specifically affects lora fine tuning mostly.

Orenguteng

Owner May 5

@algorithm https://www.reddit.com/r/LocalLLaMA/comments/1ckvx9l/part2_confirmed_possible_bug_llama3_gguf/

It seems GGUF's are broken. This is huge. Not about CPU or GPU, it's regardless. AWQ tested in 4-bit produces correct outcome, something in GGUF is broken and llama.cpp.

algorithm

May 5

@Orenguteng Very interesting, I agree this is a big deal and yes it's regardless of CPU or GPU. I'm keeping an eye on the github as we speak. I hope they'll narrow down the problem. Thanks for letting me know!

dadadies

May 10

•

edited May 10

Has the issue been fixed? Is it safe to download the model now? (Noob question) Whats the difference with this compared to the original besides being GGUF and supposedly uncensored?

rdtfddgrffdgfdghfghdfujgdhgsf

May 10

I too am wondering about this, as there is so much high-level discussion that's way above my head! I just want me some sweet, uncensored gguf of L3 but all this talk of quanting prompts and stop strings and wotnot just gives me a headache - and I'm not alone!

We're relying on you @Orenguteng ! *puppy dog eyes

Orenguteng

Owner May 10

@dadadies the Issue has been closed, it seems https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main updated their tokenizer 18 hours ago, and there's still some issues. You can safely download this and use it as you wish, until a better version will release with fixed tokenizer etc. This one was an early release and works good enough but will become better.

Orenguteng changed pull request status to closed May 10

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment