TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 21 days ago • 23
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper • 2411.06176 • Published Nov 9, 2024 • 45 • 2
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns Paper • 2403.13315 • Published Mar 20, 2024
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths Paper • 2410.10858 • Published Oct 7, 2024
Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models Paper • 2409.14277 • Published Sep 22, 2024
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29, 2024 • 56