Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 17
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation Paper • 2409.20385 • Published Sep 30, 2024
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation Paper • 2410.12722 • Published Oct 16, 2024 • 5
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models Paper • 2404.05904 • Published Apr 8, 2024 • 8
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published Jun 17, 2024 • 8