Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models Paper • 2311.00871 • Published Nov 1, 2023 • 2