The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 1 day ago • 43
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published 1 day ago • 11
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 2 days ago • 48
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 3 days ago • 20
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Paper • 2406.12624 • Published 8 days ago • 31
4M Models Collection Multimodal models from https://4m.epfl.ch/ • 14 items • Updated 12 days ago • 28
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Paper • 2406.14562 • Published 6 days ago • 25
REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark Paper • 2406.11927 • Published 9 days ago • 7
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published 12 days ago • 16
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents Paper • 2406.13923 • Published 7 days ago • 20
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 6 days ago • 27
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published 6 days ago • 32
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published 8 days ago • 14
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published 10 days ago • 28
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper • 2406.11612 • Published 9 days ago • 19
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology Paper • 2406.11912 • Published 10 days ago • 25
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 9 days ago • 53
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published 8 days ago • 13
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 9 days ago • 28
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published 9 days ago • 37
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published 10 days ago • 11
VideoGUI: A Benchmark for GUI Automation from Instructional Videos Paper • 2406.10227 • Published 12 days ago • 8
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published 12 days ago • 25
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Paper • 2406.08451 • Published 14 days ago • 22
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 12 days ago • 21
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published 12 days ago • 47
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published 12 days ago • 52
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published 13 days ago • 85
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published 12 days ago • 70
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 12 days ago • 143
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 15 days ago • 34
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus Paper • 2406.08598 • Published 14 days ago • 5
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published 13 days ago • 22
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published 13 days ago • 17
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 13 days ago • 47
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models Paper • 2406.08487 • Published 14 days ago • 10
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published 14 days ago • 44
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 16 days ago • 34
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published 19 days ago • 27
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 15 days ago • 29
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published 14 days ago • 38
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 14 days ago • 23
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering Paper • 2406.06573 • Published 23 days ago • 7
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published 20 days ago • 12
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language Paper • 2406.05629 • Published 18 days ago • 6
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published 15 days ago • 16
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 15 days ago • 52
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published 21 days ago • 17
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published 16 days ago • 60
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published 16 days ago • 22