Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 54
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 259
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations Paper • 2402.17700 • Published Feb 27, 2024 • 2