Aria: An Open Multimodal Native Mixture-of-Experts Model Paper β’ 2410.05993 β’ Published Oct 8 β’ 107
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 15 items β’ Updated Sep 18 β’ 156
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper β’ 2408.14468 β’ Published Aug 26 β’ 34
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting Paper β’ 2408.11706 β’ Published Aug 21 β’ 6
Q-Ground: Image Quality Grounding with Large Multi-modality Models Paper β’ 2407.17035 β’ Published Jul 24 β’ 1
Magpie-Qwen2 Datasets Collection Dataset built with Qwen2 72B and Qwen2 7B. β’ 6 items β’ Updated Sep 14 β’ 10
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper β’ 2407.15754 β’ Published Jul 22 β’ 19
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated Aug 18 β’ 198
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper β’ 2407.04842 β’ Published Jul 5 β’ 52
InternVL 2.0 Collection Expanding Performance Boundaries of Open-Source MLLM β’ 16 items β’ Updated about 1 month ago β’ 76
CMC-Bench: Towards a New Paradigm of Visual Signal Compression Paper β’ 2406.09356 β’ Published Jun 13 β’ 4
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper β’ 2406.08407 β’ Published Jun 12 β’ 24
A-Bench: Are LMMs Masters at Evaluating AI-generated Images? Paper β’ 2406.03070 β’ Published Jun 5 β’ 2
MaPO Collection This collection includes the models and datasets as a part of the MaPO release. β’ 9 items β’ Updated Jun 12 β’ 5
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated Sep 18 β’ 347