Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 3 days ago • 34
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 8 days ago • 22
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 9 days ago • 32
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 21 days ago • 67
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 23 days ago • 272
Running on CPU Upgrade 514 514 Open Ko-LLM Leaderboard 📉 Explore and filter language model benchmark results
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3 • 18
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 25
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 72
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Paper • 2412.17483 • Published Dec 23, 2024 • 31
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published Dec 20, 2024 • 17
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 40
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published Dec 19, 2024 • 19
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published Dec 23, 2024 • 46
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 50
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 35
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89