How to handle the truncate part when concating multiple sequences in pretraining phrase?

#61
by feiyulv - opened

Hi, when pretraining , we concat multiple sequences into a 8192 batch. How to handle the last sequence when it exceeds 8912 with preivous sequences?

  1. dicard the last sequence, and padding the previous sequences to 8192
  2. truncating the last sequnce with 8192

Which startegy do we use?

Sign up or log in to comment