Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Paper • 2412.09428 • Published Dec 12, 2024 • 7
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 23
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published Nov 22, 2024 • 13
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models Paper • 2410.00363 • Published Oct 1, 2024 • 1
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models Paper • 2312.06685 • Published Dec 9, 2023 • 1
Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype Paper • 2408.09984 • Published Aug 19, 2024 • 1
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Paper • 2410.13360 • Published Oct 17, 2024 • 8
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow Paper • 2410.07536 • Published Oct 10, 2024 • 5
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23, 2024 • 24
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT Paper • 2406.18583 • Published Jun 5, 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Paper • 2405.05945 • Published May 9, 2024 • 2
Video Background Music Generation: Dataset, Method and Evaluation Paper • 2211.11248 • Published Nov 21, 2022
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT Paper • 2306.17103 • Published Jun 29, 2023 • 1
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28, 2024 • 21
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Paper • 2405.05945 • Published May 9, 2024 • 2
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT Paper • 2406.18583 • Published Jun 5, 2024
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10, 2024 • 14
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5, 2024 • 33
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5, 2024 • 33