CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper โข 2410.16256 โข Published Oct 21, 2024 โข 60
Running 98 98 Open VLM Video Leaderboard ๐ VLMEvalKit Eval Results in video understanding benchmark
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper โข 2409.16191 โข Published Sep 24, 2024 โข 42
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper โข 2407.11963 โข Published Jul 16, 2024 โข 44
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper โข 2406.14515 โข Published Jun 20, 2024 โข 33
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper โข 2406.14544 โข Published Jun 20, 2024 โข 35
MaRiOrOsSi/t5-base-finetuned-question-answering Text2Text Generation โข Updated Apr 8, 2022 โข 1.3k โข 32