MARS: Unleashing the Power of Variance Reduction for Training Large Models Paper • 2411.10438 • Published 7 days ago • 11
Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling Paper • 2410.07145 • Published Oct 9 • 2
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 50
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models Paper • 2406.15718 • Published Jun 22 • 14