The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published 7 days ago • 4
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 3 days ago • 16
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 2 days ago • 16
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 2 days ago • 25
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 3 days ago • 34
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 2 days ago • 24
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 9 days ago • 22
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 9 days ago • 119
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published 9 days ago • 33
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 10 days ago • 44
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 11 days ago • 17
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 10 days ago • 30
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 10 days ago • 65
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published 11 days ago • 20
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 11 days ago • 23
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 12 days ago • 16
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Paper • 2409.11055 • Published 12 days ago • 16
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published 12 days ago • 4
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 11 days ago • 12
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records Paper • 2409.07012 • Published 18 days ago • 3
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Paper • 2409.06277 • Published 19 days ago • 14
One missing piece in Vision and Language: A Survey on Comics Understanding Paper • 2409.09502 • Published 14 days ago • 23
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published 15 days ago • 7
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper • 2409.10516 • Published 12 days ago • 31
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 15 days ago • 44
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published 16 days ago • 10
Apollo: Band-sequence Modeling for High-Quality Audio Restoration Paper • 2409.08514 • Published 16 days ago • 8
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published 17 days ago • 11
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 17 days ago • 61
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published 17 days ago • 6
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 18 days ago • 53
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 16 days ago • 41
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 23 days ago • 37
gsplat: An Open-Source Library for Gaussian Splatting Paper • 2409.06765 • Published 18 days ago • 11
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 17 days ago • 18
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos Paper • 2409.07450 • Published 17 days ago • 10
ProteinBench: A Holistic Evaluation of Protein Foundation Models Paper • 2409.06744 • Published 19 days ago • 6
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis Paper • 2409.07129 • Published 18 days ago • 6
Can Large Language Models Unlock Novel Scientific Research Ideas? Paper • 2409.06185 • Published 19 days ago • 9
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published 18 days ago • 19
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published 18 days ago • 57
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications Paper • 2409.07314 • Published 17 days ago • 50
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding Paper • 2409.06210 • Published 19 days ago • 24
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published 20 days ago • 29
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Paper • 2409.05865 • Published 19 days ago • 14
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 22 days ago • 20
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 19 days ago • 14
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published 21 days ago • 22
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published 19 days ago • 26
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 19 days ago • 45
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 24 days ago • 71