Reason behind not using special tokens in the prompt format?
#2
by
Doctor-Shotgun
- opened
Hello, hobbyist model finetuner here. Thanks for sharing your training hyperparameters!
I was just curious if there was a specific reason behind not using dedicated special tokens for role headers in the prompt format (such as the ones already defined in the llama 3 tokenizer, i.e. <|start_header_id|>
etc.)?
It appears that the <|system|>
, <|user|>
, and <|assistant|>
headers used in the prompt format are not defined special tokens, so they could in theory be variably tokenized into different combinations of substrings during training/inference.
From the paper it seems like some empiric testing was done - was this also attempted with the tokens above being defined as special?
I just found about this and I'm curious as well.