LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 2 days ago • 24
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 10 days ago • 30
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Paper • 2409.12941 • Published 9 days ago • 13
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 9 days ago • 64
LLMs Will Always Hallucinate, and We Need to Live With This Paper • 2409.05746 • Published 19 days ago • 2
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts Paper • 2407.21770 • Published Jul 31 • 22
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 9 days ago • 119
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement Paper • 2409.11378 • Published 11 days ago • 1
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 15 days ago • 30
UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Paper • 2409.04081 • Published 23 days ago • 3
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 18 days ago • 53
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 23 days ago • 37
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 22 days ago • 20
Pron vs Prompt: Can Large Language Models already Challenge a World-Class Fiction Author at Creative Text Writing? Paper • 2407.01119 • Published Jul 1 • 1
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture Paper • 2409.02889 • Published 24 days ago • 54
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 24 days ago • 85
In Defense of RAG in the Era of Long-Context Language Models Paper • 2409.01666 • Published 26 days ago • 2
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 26 days ago • 95
When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options Paper • 2409.00113 • Published Aug 27 • 2
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios Paper • 2408.17267 • Published 29 days ago • 22
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5 • 18
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 81
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28 • 20
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 41
Text2SQL is Not Enough: Unifying AI and Databases with TAG Paper • 2408.14717 • Published Aug 27 • 23
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published Aug 20 • 11
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published Aug 21 • 16
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 53
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling Paper • 2408.04810 • Published Aug 9 • 22
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning Paper • 2408.07931 • Published Aug 15 • 18
Thermometer: Towards Universal Calibration for Large Language Models Paper • 2403.08819 • Published Feb 20 • 1
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference Paper • 2401.12200 • Published Jan 22 • 1
Provably Robust DPO: Aligning Language Models with Noisy Feedback Paper • 2403.00409 • Published Mar 1 • 1
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts Paper • 2407.00256 • Published Jun 28 • 1
Towards Modular LLMs by Building and Reusing a Library of LoRAs Paper • 2405.11157 • Published May 18 • 25
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos Paper • 2403.05535 • Published Mar 8 • 1
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Paper • 2402.04249 • Published Feb 6 • 3
FairProof : Confidential and Certifiable Fairness for Neural Networks Paper • 2402.12572 • Published Feb 19 • 1
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 32
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
Recursive Introspection: Teaching Language Model Agents How to Self-Improve Paper • 2407.18219 • Published Jul 25 • 3
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle Paper • 2407.13833 • Published Jul 18 • 11
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 67
Reflexion: Language Agents with Verbal Reinforcement Learning Paper • 2303.11366 • Published Mar 20, 2023 • 4
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models Paper • 2305.04091 • Published May 6, 2023 • 2