Metharme Tokens

#6
by Mar2ck - opened

What are the IDs for <|user|>, <|model|>, <|system|> supposed to be?
I'm trying to use Metharme format but it isn't getting tokenized properly and the output is much worse then when using Mistral tokens.

Cydonia Tokenizer.png

It's probably working as intended (?)

But I agree that model is much dumber on metharme than with mistral

I didn't want to touch the mistral vocab, so these Metharme tags are composed of multiple tokens.

>< = 4177 is not part of it. > and < should be separate (in case you need the token ids for something?).

I thought you might have repurposed some of the reserved tokens. The >< thing doesn't happen in real use so it's fine.

Mar2ck changed discussion status to closed

Sign up or log in to comment