LiheYoung commited on
Commit
74e9abb
·
verified ·
1 Parent(s): 8686747

Add Github repository content

Browse files
Files changed (4) hide show
  1. DA-2K.md +51 -0
  2. README.md +136 -13
  3. run.py +74 -0
  4. run_video.py +89 -0
DA-2K.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DA-2K Evaluation Benchmark
2
+
3
+ ## Introduction
4
+
5
+ ![DA-2K](assets/DA-2K.png)
6
+
7
+ DA-2K is proposed in [Depth Anything V2](https://depth-anything-v2.github.io) to evaluate the relative depth estimation capability. It encompasses eight representative scenarios of `indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`. It consists of 1K diverse high-quality images and 2K precise pair-wise relative depth annotations.
8
+
9
+ Please refer to our [paper](https://arxiv.org/abs/2406.09414) for details in constructing this benchmark.
10
+
11
+
12
+ ## Usage
13
+
14
+ Please first [download the benchmark](https://huggingface.co/datasets/depth-anything/DA-2K/tree/main).
15
+
16
+ All annotations are stored in `annotations.json`. The annotation file is a JSON object where each key is the path to an image file, and the value is a list of annotations associated with that image. Each annotation describes two points and identifies which point is closer to the camera. The structure is detailed below:
17
+
18
+ ```
19
+ {
20
+ "image_path": [
21
+ {
22
+ "point1": [h1, w1], # (vertical position, horizontal position)
23
+ "point2": [h2, w2], # (vertical position, horizontal position)
24
+ "closer_point": "point1" # we always set "point1" as the closer one
25
+ },
26
+ ...
27
+ ],
28
+ ...
29
+ }
30
+ ```
31
+
32
+ To visualize the annotations:
33
+ ```bash
34
+ python visualize.py [--scene-type <type>]
35
+ ```
36
+
37
+ **Options**
38
+ - `--scene-type <type>` (optional): Specify the scene type (`indoor`, `outdoor`, `non_real`, `transparent_reflective`, `adverse_style`, `aerial`, `underwater`, and `object`). Skip this argument or set <type> as `""` to include all scene types.
39
+
40
+ ## Citation
41
+
42
+ If you find this benchmark useful, please consider citing:
43
+
44
+ ```bibtex
45
+ @article{depth_anything_v2,
46
+ title={Depth Anything V2},
47
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
48
+ journal={arXiv:2406.09414},
49
+ year={2024}
50
+ }
51
+ ```
README.md CHANGED
@@ -1,13 +1,136 @@
1
- ---
2
- title: Depth Anything V2
3
- emoji: 🌖
4
- colorFrom: red
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 4.36.0
8
- app_file: app.py
9
- pinned: false
10
- license: bsd-2-clause
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>Depth Anything V2</h1>
3
+
4
+ [**Lihe Yang**](https://liheyoung.github.io/)<sup>1</sup> · [**Bingyi Kang**](https://bingykang.github.io/)<sup>2&dagger;</sup> · [**Zilong Huang**](http://speedinghzl.github.io/)<sup>2</sup>
5
+ <br>
6
+ [**Zhen Zhao**](http://zhaozhen.me/) · [**Xiaogang Xu**](https://xiaogang00.github.io/) · [**Jiashi Feng**](https://sites.google.com/site/jshfeng/)<sup>2</sup> · [**Hengshuang Zhao**](https://hszhao.github.io/)<sup>1*</sup>
7
+
8
+ <sup>1</sup>HKU&emsp;&emsp;&emsp;<sup>2</sup>TikTok
9
+ <br>
10
+ &dagger;project lead&emsp;*corresponding author
11
+
12
+ <a href="https://arxiv.org/abs/2406.09414"><img src='https://img.shields.io/badge/arXiv-Depth Anything V2-red' alt='Paper PDF'></a>
13
+ <a href='https://depth-anything-v2.github.io'><img src='https://img.shields.io/badge/Project_Page-Depth Anything V2-green' alt='Project Page'></a>
14
+ <a href='https://huggingface.co/spaces/depth-anything/Depth-Anything-V2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
15
+ <a href='https://huggingface.co/datasets/depth-anything/DA-2K'><img src='https://img.shields.io/badge/Benchmark-DA--2K-yellow' alt='Benchmark'></a>
16
+ </div>
17
+
18
+ This work presents Depth Anything V2. It significantly outperforms [V1](https://github.com/LiheYoung/Depth-Anything) in fine-grained details and robustness. Compared with SD-based models, it enjoys faster inference speed, fewer parameters, and higher depth accuracy.
19
+
20
+ ![teaser](assets/teaser.png)
21
+
22
+ ## News
23
+
24
+ - **2024-06-14:** Paper, project page, code, models, demo, and benchmark are all released.
25
+
26
+
27
+ ## Pre-trained Models
28
+
29
+ We provide **four models** of varying scales for robust relative depth estimation:
30
+
31
+ | Model | Params | Checkpoint |
32
+ |:-|-:|:-:|
33
+ | Depth-Anything-V2-Small | 24.8M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true) |
34
+ | Depth-Anything-V2-Base | 97.5M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Base/resolve/main/depth_anything_v2_vitb.pth?download=true) |
35
+ | Depth-Anything-V2-Large | 335.3M | [Download](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) |
36
+ | Depth-Anything-V2-Giant | 1.3B | Coming soon |
37
+
38
+
39
+ ### Code snippet to use our models
40
+ ```python
41
+ import cv2
42
+ import torch
43
+
44
+ from depth_anything_v2.dpt import DepthAnythingV2
45
+
46
+ # take depth-anything-v2-large as an example
47
+ model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
48
+ model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu'))
49
+ model.eval()
50
+
51
+ raw_img = cv2.imread('your/image/path')
52
+ depth = model.infer_image(raw_img) # HxW raw depth map
53
+ ```
54
+
55
+ ## Usage
56
+
57
+ ### Installation
58
+
59
+ ```bash
60
+ git clone https://github.com/DepthAnything/Depth-Anything-V2
61
+ cd Depth-Anything-V2
62
+ pip install -r requirements.txt
63
+ ```
64
+
65
+ ### Running
66
+
67
+ ```bash
68
+ python run.py --encoder <vits | vitb | vitl | vitg> --img-path <path> --outdir <outdir> [--input-size <size>] [--pred-only] [--grayscale]
69
+ ```
70
+ Options:
71
+ - `--img-path`: You can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths.
72
+ - `--input-size` (optional): By default, we use input size `518` for model inference. **You can increase the size for even more fine-grained results.**
73
+ - `--pred-only` (optional): Only save the predicted depth map, without raw image.
74
+ - `--grayscale` (optional): Save the grayscale depth map, without applying color palette.
75
+
76
+ For example:
77
+ ```bash
78
+ python run.py --encoder vitl --img-path assets/examples --outdir depth_vis
79
+ ```
80
+
81
+ **If you want to use Depth Anything V2 on videos:**
82
+
83
+ ```bash
84
+ python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis
85
+ ```
86
+
87
+ *Please note that our larger model has better temporal consistency on videos.*
88
+
89
+
90
+ ### Gradio demo
91
+
92
+ To use our gradio demo locally:
93
+
94
+ ```bash
95
+ python app.py
96
+ ```
97
+
98
+ You can also try our [online demo](https://huggingface.co/spaces/Depth-Anything/Depth-Anything-V2).
99
+
100
+ **Note:** Compared to V1, we have made a minor modification to the DINOv2-DPT architecture (originating from this [issue](https://github.com/LiheYoung/Depth-Anything/issues/81)). In V1, we *unintentionally* used features from the last four layers of DINOv2 for decoding. In V2, we use [intermediate features](https://github.com/DepthAnything/Depth-Anything-V2/blob/2cbc36a8ce2cec41d38ee51153f112e87c8e42d8/depth_anything_v2/dpt.py#L164-L169) instead. Although this modification did not improve details or accuracy, we decided to follow this common practice.
101
+
102
+
103
+
104
+ ## Fine-tuned to Metric Depth Estimation
105
+
106
+ Please refer to [metric depth estimation](./metric_depth).
107
+
108
+
109
+ ## DA-2K Evaluation Benchmark
110
+
111
+ Please refer to [DA-2K benchmark](./DA-2K.md).
112
+
113
+ ## LICENSE
114
+
115
+ Depth-Anything-V2-Small model is under the Apache-2.0 license. Depth-Anything-V2-Base/Large/Giant models are under the CC-BY-NC-4.0 license.
116
+
117
+
118
+ ## Citation
119
+
120
+ If you find this project useful, please consider citing:
121
+
122
+ ```bibtex
123
+ @article{depth_anything_v2,
124
+ title={Depth Anything V2},
125
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
126
+ journal={arXiv:2406.09414},
127
+ year={2024}
128
+ }
129
+
130
+ @inproceedings{depth_anything_v1,
131
+ title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
132
+ author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
133
+ booktitle={CVPR},
134
+ year={2024}
135
+ }
136
+ ```
run.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import cv2
3
+ import glob
4
+ import matplotlib
5
+ import numpy as np
6
+ import os
7
+ import torch
8
+
9
+ from depth_anything_v2.dpt import DepthAnythingV2
10
+
11
+
12
+ if __name__ == '__main__':
13
+ parser = argparse.ArgumentParser(description='Depth Anything V2')
14
+
15
+ parser.add_argument('--img-path', type=str)
16
+ parser.add_argument('--input-size', type=int, default=518)
17
+ parser.add_argument('--outdir', type=str, default='./vis_depth')
18
+
19
+ parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
20
+
21
+ parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
22
+ parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
23
+
24
+ args = parser.parse_args()
25
+
26
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27
+
28
+ # we are undergoing company review procedures to release Depth-Anything-Giant checkpoint
29
+ model_configs = {
30
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
31
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
32
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
33
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
34
+ }
35
+
36
+ depth_anything = DepthAnythingV2(**model_configs[args.encoder])
37
+ depth_anything.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
38
+ depth_anything = depth_anything.to(DEVICE).eval()
39
+
40
+ if os.path.isfile(args.img_path):
41
+ if args.img_path.endswith('txt'):
42
+ with open(args.img_path, 'r') as f:
43
+ filenames = f.read().splitlines()
44
+ else:
45
+ filenames = [args.img_path]
46
+ else:
47
+ filenames = glob.glob(os.path.join(args.img_path, '**/*'), recursive=True)
48
+
49
+ os.makedirs(args.outdir, exist_ok=True)
50
+
51
+ cmap = matplotlib.colormaps.get_cmap('Spectral_r')
52
+
53
+ for k, filename in enumerate(filenames):
54
+ print(f'Progress {k+1}/{len(filenames)}: {filename}')
55
+
56
+ raw_image = cv2.imread(filename)
57
+
58
+ depth = depth_anything.infer_image(raw_image, args.input_size)
59
+
60
+ depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
61
+ depth = depth.astype(np.uint8)
62
+
63
+ if args.grayscale:
64
+ depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
65
+ else:
66
+ depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
67
+
68
+ if args.pred_only:
69
+ cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), depth)
70
+ else:
71
+ split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
72
+ combined_result = cv2.hconcat([raw_image, split_region, depth])
73
+
74
+ cv2.imwrite(os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.png'), combined_result)
run_video.py ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import cv2
3
+ import glob
4
+ import matplotlib
5
+ import numpy as np
6
+ import os
7
+ import torch
8
+
9
+ from depth_anything_v2.dpt import DepthAnythingV2
10
+
11
+
12
+ if __name__ == '__main__':
13
+ parser = argparse.ArgumentParser(description='Depth Anything V2')
14
+
15
+ parser.add_argument('--video-path', type=str)
16
+ parser.add_argument('--input-size', type=int, default=518)
17
+ parser.add_argument('--outdir', type=str, default='./vis_video_depth')
18
+
19
+ parser.add_argument('--encoder', type=str, default='vitl', choices=['vits', 'vitb', 'vitl', 'vitg'])
20
+
21
+ parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
22
+ parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
23
+
24
+ args = parser.parse_args()
25
+
26
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
27
+
28
+ # 'we are undergoing company review procedures to release Depth-Anything-Giant checkpoint
29
+ model_configs = {
30
+ 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
31
+ 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
32
+ 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
33
+ 'vitg': {'encoder': 'vitg', 'features': 384, 'out_channels': [1536, 1536, 1536, 1536]}
34
+ }
35
+
36
+ depth_anything = DepthAnythingV2(**model_configs[args.encoder])
37
+ depth_anything.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{args.encoder}.pth', map_location='cpu'))
38
+ depth_anything = depth_anything.to(DEVICE).eval()
39
+
40
+ if os.path.isfile(args.video_path):
41
+ if args.video_path.endswith('txt'):
42
+ with open(args.video_path, 'r') as f:
43
+ lines = f.read().splitlines()
44
+ else:
45
+ filenames = [args.video_path]
46
+ else:
47
+ filenames = glob.glob(os.path.join(args.video_path, '**/*'), recursive=True)
48
+
49
+ os.makedirs(args.outdir, exist_ok=True)
50
+
51
+ margin_width = 50
52
+ cmap = matplotlib.colormaps.get_cmap('Spectral_r')
53
+
54
+ for k, filename in enumerate(filenames):
55
+ print(f'Progress {k+1}/{len(filenames)}: {filename}')
56
+
57
+ raw_video = cv2.VideoCapture(filename)
58
+ frame_width, frame_height = int(raw_video.get(cv2.CAP_PROP_FRAME_WIDTH)), int(raw_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
59
+ frame_rate = int(raw_video.get(cv2.CAP_PROP_FPS))
60
+ output_width = frame_width * 2 + margin_width
61
+
62
+ output_path = os.path.join(args.outdir, os.path.splitext(os.path.basename(filename))[0] + '.mp4')
63
+ out = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*"mp4v"), frame_rate, (output_width, frame_height))
64
+
65
+ while raw_video.isOpened():
66
+ ret, raw_frame = raw_video.read()
67
+ if not ret:
68
+ break
69
+
70
+ depth = depth_anything.infer_image(raw_frame, args.input_size)
71
+
72
+ depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
73
+ depth = depth.astype(np.uint8)
74
+
75
+ if args.grayscale:
76
+ depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
77
+ else:
78
+ depth = (cmap(depth)[:, :, :3] * 255)[:, :, ::-1].astype(np.uint8)
79
+
80
+ if args.pred_only:
81
+ out.write(depth)
82
+ else:
83
+ split_region = np.ones((frame_height, margin_width, 3), dtype=np.uint8) * 255
84
+ combined_frame = cv2.hconcat([raw_frame, split_region, depth])
85
+
86
+ out.write(combined_frame)
87
+
88
+ raw_video.release()
89
+ out.release()