Discrepancy between Base and Instruct model eos_token.

#119
by richardlian - opened

Hi, I'm currently pretraining a model (then fine-tuning) with <|end_of_text|> as the eos_token and I would like to know if I had made a mistake. I have two questions:

  1. For the Instruct model, I see that the eos_token was set to <|end_of_text|> at release but was switched to <|eot_id|>. Is the use of <|eot_id|> only specific to the Instruct models? I see the base model still has the eos_token set to <|end_of_text|>. Or is it that <|end_of_text|> was used as the eos_token for pretraining but was then switched to <|eot_id|> for instruction fine-tuning to delineate between conversation turns?
  2. Is the eos_token necessary during pretraining since we can use the bos_token to delineate between documents?

Sign up or log in to comment