Breaking Focus: Contextual Distraction Curse in Large Language Models Paper • 2502.01609 • Published Feb 3 • 1
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published 20 days ago • 46
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge Paper • 2410.02736 • Published Oct 3, 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Paper • 2410.21259 • Published Oct 28, 2024 • 1
Breaking Focus: Contextual Distraction Curse in Large Language Models Paper • 2502.01609 • Published Feb 3 • 1
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 39