Depth-Anything-V2-Large (Transformers version)

Introduction

Depth Anything V2 is trained from 595K synthetic labeled images & 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features:

  • more fine-grained details than Depth Anything V1
  • more robust than Depth Anything V1 & SD-based models (e.g., Marigold, Geowizard)
  • more efficient (10x faster) & more lightweight than SD-based models
  • impressive fine-tuned performance with our pre-trained models

Installation

git clone https://github.com/MackinationsAi/Upgraded-Depth-Anything-V2.git
cd Upgraded-Depth-Anything-V2
one_click_install.bat

Usage

Please refer to the README.md for actual usage.

Test Code

cd Upgraded-Depth-Anything-V2
venv\scripts\activate
python test.py /path/to/your/image.jpg (or .png)

Create a test.py script using the code below:

import cv2
import torch
import numpy as np
import os
import argparse

from safetensors.torch import load_file
from depth_anything_v2.dpt import DepthAnythingV2

# Argument parser for input image path
parser = argparse.ArgumentParser(description="Depth map inference using DepthAnythingV2 model.")
parser.add_argument("input_image_path", type=str, help="Path to the input image")
args = parser.parse_args()

# Determine the directory of this script
script_dir = os.path.dirname(os.path.abspath(__file__))

# Set output path relative to the script directory
output_image_path = os.path.join(script_dir, "base_udav2_hf-code-test.png")
checkpoint_path = os.path.join(script_dir, "checkpoints", "depth_anything_v2_vitl.safetensors")

# Device selection: CUDA, MPS, or CPU
if torch.cuda.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cpu')

model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])

state_dict = load_file(checkpoint_path, device='cpu')

model.load_state_dict(state_dict)
model.to(device)
model.eval()

# Load the input image
raw_img = cv2.imread(args.input_image_path)

# Infer the depth map
depth = model.infer_image(raw_img)  # HxW raw depth map

# Normalize the depth map to 0-255 for saving as an image
depth_normalized = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX)
depth_normalized = depth_normalized.astype(np.uint8)

cv2.imwrite(output_image_path, depth_normalized)
print(f"Depth map saved at {output_image_path}")

Citation

If you find this project useful, please consider citing MackinationsAi & the following:

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Zhao, Zhen & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

@inproceedings{depth_anything_v1,
  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data}, 
  author={Yang, Lihe & Kang, Bingyi & Huang, Zilong & Xu, Xiaogang & Feng, Jiashi & Zhao, Hengshuang},
  booktitle={CVPR},
  year={2024}
}
Downloads last month
15
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including MackinationsAi/depth-anything-v2-large-hf