B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 19 days ago • 45
M-STAR Collection Resources of M-STAR (Multimodal Self-Evolving Training for Reasoning) https://mstar-lmm.github.io/ • 2 items • Updated 17 days ago • 2