Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 10 days ago • 33
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published 11 days ago • 67
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 12 days ago • 112
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 17 days ago • 78
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 57
Transformers Can Navigate Mazes With Multi-Step Prediction Paper • 2412.05117 • Published 24 days ago • 5
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving Paper • 2407.00079 • Published Jun 24 • 5
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published about 1 month ago • 55
Agent Skill Acquisition for Large Language Models via CycleQD Paper • 2410.14735 • Published Oct 16 • 2
BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once Paper • 2405.12971 • Published May 21 • 2
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11 • 34
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Nov 27 • 63