VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Paper • 2409.07450 • Published 17 days ago • 10
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published 18 days ago • 19
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published 18 days ago • 14
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 19 days ago • 14
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published 21 days ago • 22
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published 19 days ago • 14
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published 20 days ago • 26
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published 20 days ago • 29
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 24 days ago • 71
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 19 days ago • 45
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published 23 days ago • 16
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper • 2409.04410 • Published 22 days ago • 23
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published 24 days ago • 27
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published 23 days ago • 30
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents Paper • 2409.03512 • Published 24 days ago • 25
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published 25 days ago • 44
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published 24 days ago • 23
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published 24 days ago • 54
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published 24 days ago • 27
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published 24 days ago • 43
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published 26 days ago • 75
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published 27 days ago • 26
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 25 days ago • 33
Compositional 3D-aware Video Generation with LLM Director Paper • 2409.00558 • Published 28 days ago • 14
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published 27 days ago • 12
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models Paper • 2409.00509 • Published 28 days ago • 38
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published 30 days ago • 11
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding Paper • 2408.15545 • Published Aug 28 • 32
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization Paper • 2408.15914 • Published Aug 28 • 21
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published 30 days ago • 22
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published about 1 month ago • 45
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published about 1 month ago • 26
CSGO: Content-Style Composition in Text-to-Image Generation Paper • 2408.16766 • Published about 1 month ago • 17
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published about 1 month ago • 29
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published about 1 month ago • 56
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28 • 20
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models Paper • 2408.15915 • Published Aug 28 • 19
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation Paper • 2408.15991 • Published Aug 28 • 15
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 81
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27 • 51
Platypus: A Generalized Specialist Model for Reading Text in Various Forms Paper • 2408.14805 • Published Aug 27 • 12
GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars Paper • 2408.13674 • Published Aug 24 • 17
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper • 2408.13467 • Published Aug 24 • 23
MobileQuant: Mobile-friendly Quantization for On-device Language Models Paper • 2408.13933 • Published Aug 25 • 13
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java Paper • 2408.14354 • Published Aug 26 • 40
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences Paper • 2408.14468 • Published Aug 26 • 33
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published Aug 24 • 13
Training-free Long Video Generation with Chain of Diffusion Model Experts Paper • 2408.13423 • Published Aug 24 • 19
Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published Aug 21 • 11
Memory-Efficient LLM Training with Online Subspace Descent Paper • 2408.12857 • Published Aug 23 • 10
CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities Paper • 2408.13239 • Published Aug 23 • 10
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time Paper • 2408.13233 • Published Aug 23 • 20
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation Paper • 2408.09787 • Published Aug 19 • 6
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22 • 11
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications Paper • 2408.11878 • Published Aug 20 • 49
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search Paper • 2408.10635 • Published Aug 20 • 13