Papers
arxiv:2403.15705

SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Published on Mar 23, 2024
Authors:
,
,
,
,

Abstract

Monocular 3D reconstruction for categorical objects heavily relies on accurately perceiving each object's pose. While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose. Consequently, existing methods often depend on a third-party 3D object to provide an initial object pose, leading to increased complexity and generalization issues. To address these challenges, we present SUP-<PRE_TAG>NeRF</POST_TAG>, a Streamlined Unification of object Pose estimation and NeRF-based object reconstruction. SUP-<PRE_TAG>NeRF</POST_TAG> decouples the object's dimension estimation and pose refinement to resolve the scale-depth ambiguity, and introduces a camera-invariant projected-box representation that generalizes cross different domains. While using a dedicated pose estimator that smoothly integrates into an object-centric <PRE_TAG>NeRF</POST_TAG>, SUP-<PRE_TAG>NeRF</POST_TAG> is free from external 3D detectors. SUP-<PRE_TAG>NeRF</POST_TAG> achieves state-of-the-art results in both reconstruction and pose estimation tasks on the nuScenes dataset. Furthermore, SUP-<PRE_TAG>NeRF</POST_TAG> exhibits exceptional cross-dataset generalization on the KITTI and Waymo datasets, surpassing prior methods with up to 50\% reduction in rotation and translation error.

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.15705 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.