Running 2.21k 2.21k The Ultra-Scale Playbook π The ultimate guide to training LLM on large GPU Clusters
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published Nov 15, 2024 β’ 114
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Paper β’ 2410.23743 β’ Published Oct 31, 2024 β’ 62