Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Paper • 2409.12139 • Published 10 days ago • 11
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 10 days ago • 30
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 11 days ago • 17
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 10 days ago • 65
Can OOD Object Detectors Learn from Foundation Models? Paper • 2409.05162 • Published 20 days ago • 6
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published 17 days ago • 11
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published 16 days ago • 8
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 16 days ago • 10
TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder Paper • 2409.08248 • Published 16 days ago • 12
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published 16 days ago • 15
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 16 days ago • 15
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published 23 days ago • 37
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper • 2409.08264 • Published 16 days ago • 41
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published 17 days ago • 61
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation Paper • 2409.06703 • Published 18 days ago • 2
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding Paper • 2409.06210 • Published 19 days ago • 24
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published 18 days ago • 53
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published 18 days ago • 37
Evaluating Multiview Object Consistency in Humans and Image Models Paper • 2409.05862 • Published 19 days ago • 8
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published 20 days ago • 5
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published 22 days ago • 8
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments Paper • 2409.05865 • Published 19 days ago • 14
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published 21 days ago • 22
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published 19 days ago • 14
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published 22 days ago • 20
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published 19 days ago • 26
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published 20 days ago • 29
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published 19 days ago • 45
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published 24 days ago • 71
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper • 2409.06633 • Published 18 days ago • 14
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published 19 days ago • 14
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published 24 days ago • 85
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries Paper • 2409.00844 • Published 27 days ago • 11
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation Paper • 2409.03525 • Published 23 days ago • 11
Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published 25 days ago • 14
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published 23 days ago • 17
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation Paper • 2409.03643 • Published 23 days ago • 18
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published 23 days ago • 25
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding Paper • 2409.03420 • Published 24 days ago • 23
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents Paper • 2409.03512 • Published 23 days ago • 25
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published 25 days ago • 44
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 26 days ago • 95
GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers Paper • 2409.04196 • Published 23 days ago • 11
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models Paper • 2409.02076 • Published 25 days ago • 9
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published 23 days ago • 16
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation Paper • 2409.04410 • Published 22 days ago • 23
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published 24 days ago • 27
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published 23 days ago • 30
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21 • 10
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models Paper • 2408.12114 • Published Aug 22 • 11
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper • 2408.12480 • Published Aug 22 • 13
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM Paper • 2408.12076 • Published Aug 22 • 11
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search Paper • 2408.10635 • Published Aug 20 • 13
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22 • 13