-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2409.07703
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 41 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 33 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 13 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 60
-
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Paper • 2410.16153 • Published • 42 -
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Paper • 2409.07703 • Published • 66 -
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Paper • 2410.14059 • Published • 52
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 125 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 87 -
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Paper • 2409.02634 • Published • 89 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 106
-
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 38 -
On the limits of agency in agent-based models
Paper • 2409.10568 • Published • 12 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 11 -
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Paper • 2409.07703 • Published • 66
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 51 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 41 -
Generative Agents: Interactive Simulacra of Human Behavior
Paper • 2304.03442 • Published • 11 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 183 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 24 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 33