allenai
/

Llama-3.1-Tulu-3-70B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Reason behind not using special tokens in the prompt format?

#2

by Doctor-Shotgun - opened about 18 hours ago

about 18 hours ago

•

edited about 18 hours ago

Hello, hobbyist model finetuner here. Thanks for sharing your training hyperparameters!

I was just curious if there was a specific reason behind not using dedicated special tokens for role headers in the prompt format (such as the ones already defined in the llama 3 tokenizer, i.e. <|start_header_id|>etc.)?

It appears that the <|system|>, <|user|>, and <|assistant|> headers used in the prompt format are not defined special tokens, so they could in theory be variably tokenized into different combinations of substrings during training/inference.

From the paper it seems like some empiric testing was done - was this also attempted with the tokens above being defined as special?

about 1 hour ago

I just found about this and I'm curious as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment