MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 22 days ago • 179
NatureLM: Deciphering the Language of Nature for Scientific Discovery Paper • 2502.07527 • Published about 1 month ago • 19
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 208