Multimodal Language Model Benchmarks Collection Multimodal benchmarks that test various aspects of LLMs, VLMs, LMMs • 14 items • Updated Sep 11 • 1
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics Paper • 2406.14051 • Published Jun 20 • 9
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models Paper • 2406.14035 • Published Jun 20 • 12
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents Paper • 2305.13455 • Published May 22, 2023 • 3