BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18, 2024 • 46
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published 25 days ago
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data Paper • 2404.03862 • Published Apr 5, 2024
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees Paper • 2404.08417 • Published Apr 12, 2024 • 1
Dated Data: Tracing Knowledge Cutoffs in Large Language Models Paper • 2403.12958 • Published Mar 19, 2024
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published 8 days ago • 23
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published 8 days ago • 7
Rethinking the Influence of Source Code on Test Case Generation Paper • 2409.09464 • Published Sep 14, 2024 • 1
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published 8 days ago • 7