VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20 • 17
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated 25 days ago • 182
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26 • 35
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper • 2408.11706 • Published Aug 21 • 6
Q-Ground: Image Quality Grounding with Large Multi-modality Models Paper • 2407.17035 • Published Jul 24 • 1
Magpie-Qwen2 Datasets Collection Dataset built with Qwen2 72B and Qwen2 7B. • 6 items • Updated Sep 14 • 10
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22 • 19
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 8 days ago • 205
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
InternVL2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated 2 days ago • 87
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper • 2406.09356 • Published Jun 13 • 4
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? Paper • 2406.03070 • Published Jun 5 • 2
MaPO Collection This collection includes the models and datasets as a part of the MaPO release. • 9 items • Updated Jun 12 • 5
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Nov 28 • 352