RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published 4 days ago • 24
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 132