What does the tokenization for fill-in-the-middle requests look like?

#5
by XeIaso - opened

I'm looking at messing around with the fill-in-the-middle support for codestral but I can't figure out what I'd do to use it. I see that there's a FIMRequest class, but I want to know what tokens I should use for it with llama.cpp.

Thanks for making these models! They're a lot of fun to use personally and professionally.

based on mistralai/mistral-common/tokens/tokenizers/sentencepiece.py#L335 and mistralai/mistral-common/tokens/tokenizers/base.py#L10 the prompt should be like

<s>[SUFFIX]suffix_code[PREFIX]prefix_code

with the bos token </s> as stopping condition. However, I also see a [MIDDLE] token which isn't used, maybe I'm forgetting something?

My expectation is that middle is used as the last token before the generated response.

Sign up or log in to comment