Running 1.73k 1.73k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 13 days ago • 30
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 11 days ago • 134
Fast Inference from Transformers via Speculative Decoding Paper • 2211.17192 • Published Nov 30, 2022 • 5