BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference Paper • 2310.11142 • Published Oct 17, 2023
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads Paper • 2412.00127 • Published Nov 28, 2024 • 1
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published 28 days ago • 123
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric Paper • 2411.16619 • Published Nov 25, 2024
ClothesNet: An Information-Rich 3D Garment Model Repository with Simulated Clothes Environment Paper • 2308.09987 • Published Aug 19, 2023 • 1
RPMArt: Towards Robust Perception and Manipulation for Articulated Objects Paper • 2403.16023 • Published Mar 24, 2024
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation Paper • 2409.18082 • Published Sep 26, 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published Sep 30, 2024 • 14
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Paper • 2408.14008 • Published Aug 26, 2024
Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model Paper • 2407.21408 • Published Jul 31, 2024
Dual-Branch Network for Portrait Image Quality Assessment Paper • 2405.08555 • Published May 14, 2024
Grounded Question-Answering in Long Egocentric Videos Paper • 2312.06505 • Published Dec 11, 2023 • 1
Video Background Music Generation with Controllable Music Transformer Paper • 2111.08380 • Published Nov 16, 2021 • 1
Q-Boost: On Visual Quality Assessment Ability of Low-level Multi-Modality Foundation Models Paper • 2312.15300 • Published Dec 23, 2023 • 2