MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models Paper • 2502.00698 • Published 4 days ago • 20
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 3 days ago • 33
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model Paper • 2501.18636 • Published 9 days ago • 25
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper • 2501.18119 • Published 7 days ago • 22