license: cc-by-nc-sa-2.0
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo1*† · Sparsh Garg2† · S. Mahdi H. Miangoleh3 · Xinyu Huang1 · Liu Ren1
1Bosch Research North America 2Carnegie Mellon University 3Simon Fraser University
*corresponding author †equal technical contribution
Depth Any Camera (DAC) is a novel zero-shot metric depth estimation framework that extends a perspective-trained model to handle any type of camera with varying FoVs effectively.
Notably, DAC can be trained exclusively on perspective images, yet it generalizes seamlessly to fisheye and 360 cameras without requiring specialized training data. Key features include:
- Zero-shot metric depth estimation on fisheye and 360 images, significantly outperforming prior metric depth SoTA Metric3D-v2 and UniDepth.
- Geometry-based training framework adaptable to any network architecture, extendable to other 3D perception tasks.
Tired of collecting new data for specific cameras? DAC maximizes the utility of every existing 3D data for training, regardless of the specific camera types used in new applications.
Visualization
ScanNet++ fisheye
The zero-shot metric depth estimation results of Depth Any Camera (DAC) are visualized on ScanNet++ fisheye videos and compared to Metric3D-v2. The visualizations of A.Rel error against ground truth highlight the superior performance of DAC.
Matterport3D single-view reconstruction
Additionally, we showcase DAC's application on 360-degree images, where a single forward pass of depth estimation enables full 3D scene reconstruction.
Additional visual results and comparison with the prior SoTA can be found at
Performance
Depth Any Camera performs significantly better than the previous SoTA metric depth estimation models Metric3D-v2 and UniDepth in zero-shot generalization to large FoV camera images given significantly smaller training dataset and model size.
Method | Training Data Size | Matterport3D (360) | Pano3D-GV2 (360) | ScanNet++ (fisheye) | KITTI360 (fisheye) | ||||
---|---|---|---|---|---|---|---|---|---|
AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | AbsRel | $\delta_1$ | ||
UniDepth-VitL | 3M | 0.7648 | 0.2576 | 0.7892 | 0.2469 | 0.4971 | 0.3638 | 0.2939 | 0.4810 |
Metric3D-v2-VitL | 16M | 0.2924 | 0.4381 | 0.3070 | 0.4040 | 0.2229 | 0.5360 | 0.1997 | 0.7159 |
Ours-Resnet101 | 670K-indoor / 130K-outdoor | 0.156 | 0.7727 | 0.1387 | 0.8115 | 0.1323 | 0.8517 | 0.1559 | 0.7858 |
Ours-SwinL | 670K-indoor / 130K-outdoor | 0.1789 | 0.7231 | 0.1836 | 0.7287 | 0.1282 | 0.8544 | 0.1487 | 0.8222 |
We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).