The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published 15 days ago • 112
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 10 days ago • 32
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published 20 days ago • 54
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 256
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 76
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 78
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 15
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 16
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24, 2024 • 61
view article Article Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 By Isayoften • Aug 26, 2024 • 50
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 99
LLM Reasoning Papers Collection Papers to improve reasoning capabilities of LLMs • 20 items • Updated Jan 15 • 116
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • Sep 27, 2024 • 40
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation Paper • 2407.10817 • Published Jul 15, 2024 • 14