Submitted by akhaliq 59 TÜLU 3: Pushing Frontiers in Open Language Model Post-Training · 23 authors 2
Submitted by adamdad 55 OminiControl: Minimal and Universal Control for Diffusion Transformer · 5 authors 8
Submitted by xanderhuang 44 Material Anything: Generating Materials for Any 3D Object via Diffusion · 4 authors 3
Submitted by chaehun 35 Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator · 4 authors 5
Submitted by gabrielchua 21 A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection · 3 authors 2
Submitted by kcz358 16 Large Multi-modal Models Can Interpret Features in Large Multi-modal Models · 4 authors 2
Submitted by JackyZhuo 13 VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection · 10 authors 3
Submitted by younggyoseo 11 Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction · 5 authors 2
Submitted by j-min 9 VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement · 4 authors 3
Submitted by dnoever 7 The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz · 2 authors 3
Submitted by JusperLee 4 Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images · 8 authors 2
Submitted by colo286 3 One to rule them all: natural language to bind communication, perception and action · 3 authors 2