SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 3 days ago • 50
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published 4 days ago • 60
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 12 days ago • 85
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published 19 days ago • 14
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published 20 days ago • 41
Personalized Multimodal Large Language Models: A Survey Paper • 2412.02142 • Published Dec 3, 2024 • 13
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published Dec 4, 2024 • 46
LM-Cocktail: Resilient Tuning of Language Models via Model Merging Paper • 2311.13534 • Published Nov 22, 2023 • 4
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 474
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published Nov 1, 2024 • 8
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality Paper • 2410.05210 • Published Oct 7, 2024 • 10
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 19
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3, 2024 • 52
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 13