long repeatitions

#2
by subbur - opened

model wont stop after an answer, repeats again and again, but can give very long output, I have to kill the process, to stop it

yeah, same problem - it keeps generating content continuously or sometimes stops after "The". The first message always seem to be ok.

Crusoe AI org

Are you experiencing this with all quants? I will be regenerating them after this PR is merged: https://github.com/ggerganov/llama.cpp/pull/6920

Crusoe AI org

Hi @nullt3r @subbur , can you please test with the 1048k context model? this was generated with the above PR merged so it should benefit from tokenization fixes as well as additional training: https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF

It works much better - its functional model now. Thanks.

Crusoe AI org

Glad to hear it!

3thn changed discussion status to closed

Hi @nullt3r @subbur , can you please test with the 1048k context model? this was generated with the above PR merged so it should benefit from tokenization fixes as well as additional training: https://huggingface.co/crusoeai/Llama-3-8B-Instruct-1048k-GGUF

For 1048k, the repeat issue is also there. Actually I go to 262k to find whether it is not an issue for this version.
I use Q8_0, and prompt "Introduce Kobe". It generate like 2700+ tokens and I stopped it manually. With a repeat_penalty, the issue will be better, but still have chance to output endlessly but actually not identical content.

And for long context prompt, it seems do not have such issue. So I take it as the defects for the model rather than quant.

Crusoe AI org
edited May 4

@Starlento can you try setting the eos token to 128009 using the gguf-set-metadata.py script?

edit: I also just updated the models with the bpe tokenization fixes from llamacpp

Sign up or log in to comment