Training dataset shuffling/mixing
#180
by
wish
- opened
To make the question a bit simpler assume BLOOM was trained on only Wikipedia and GitHub, how were the two datasets combined for training? Is it a simple concatenation so that during training the model will first be trained on Wikipedia documents and later GitHub, or were the datasets mixed in some way? I could not find this in the paper, sorry if it was already answered or if I overlooked the info in the paper.