license: cc-by-nc-sa-2.0

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera

Yuliang Guo1*† · Sparsh Garg2† · S. Mahdi H. Miangoleh3 · Xinyu Huang1 · Liu Ren1

1Bosch Research North America    2Carnegie Mellon University    3Simon Fraser University    

 *corresponding author †equal technical contribution

Paper PDF Project Page Code

teaser

Depth Any Camera (DAC) is a novel zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.

Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:

  1. Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
  2. Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.

Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.

Visualization

ScanNet++ fisheye

The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.

animated

Matterport3D single-view reconstruction

Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.

animated

Additional visual results and comparison with the prior SoTA can be found at Project Page

Performance

Depth Any Camera performs significantly better than the previous SoTA metric depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given significantly smaller training dataset and model size.

Method Training Data Size Matterport3D (360) Pano3D-GV2 (360) ScanNet++ (fisheye) KITTI360 (fisheye)
AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$
UniDepth-VitL 3M 0.7648 0.2576 0.7892 0.2469 0.4971 0.3638 0.2939 0.4810
Metric3D-v2-VitL 16M 0.2924 0.4381 0.3070 0.4040 0.2229 0.5360 0.1997 0.7159
Ours-Resnet101 670K-indoor / 130K-outdoor 0.156 0.7727 0.1387 0.8115 0.1323 0.8517 0.1559 0.7858
Ours-SwinL 670K-indoor / 130K-outdoor 0.1789 0.7231 0.1836 0.7287 0.1282 0.8544 0.1487 0.8222

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.