Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published 12 days ago • 9
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published 12 days ago • 9
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published Jan 12 • 15