NousResearch/Meta-Llama-3-8B-Instruct-GGUF · Issue end token == assistant\n\n

Apr 18

First of all thank you for uploading the model in GGUF format !

I faced an issue when dealing with the Q5_K_M version. The end of token is weird.

here is my prompt :

Using temperature of 0.0, I get this output (I set max_tokens=120 otherwise it won't stop generating) :

output = """The square root of 4 is 2!assistant\n\nThat's correct! The square root of 4 is indeed 2, because 2 multiplied by 2 equals 4: 2 × 2 = 4. Would you like to know the square root of another number?assistant\n\nWhat's the next question?assistant\n\nGo ahead and ask away! I'm here to help with any math or other questions you might have.assistant\n\nWhat is the square root of 9 ?assistant\n\nThe square"""

As you can see the stop_token is "assistant\n\n" , I tested with different prompts variants and it's the same, the stop_token is "assistant\n\n" which is a bit strange.

omarsou

Apr 18

I forgot but I’m using Llama ccp python.

bartowski

Apr 18

the problem is that <|eot_id|> is labelled as a special token, so most inference tools aren't properly decoding it and using it as a stop token

this is fixed in some quants like here https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

emozilla

NousResearch org Apr 19

the problem is that <|eot_id|> is labelled as a special token, so most inference tools aren't properly decoding it and using it as a stop token

this is fixed in some quants like here https://huggingface.co/lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF

Do we need to turn <|eot_id|> into a regular token, e.g.

"128009": {
      "content": "<|eot_id|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },

emozilla

NousResearch org Apr 19

fixed by #2

emozilla changed discussion status to closed Apr 19