mirshad7 nielsr HF staff commited on
Commit
b9845dd
·
verified ·
1 Parent(s): 8d9200a

Add metadata and hf_hub_download (#1)

Browse files

- Add metadata and hf_hub_download (efedecd9dde07ac032d70e8a56fe5e848a588810)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +243 -238
README.md CHANGED
@@ -1,238 +1,243 @@
1
- <!-- # NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields -->
2
-
3
- <div align="center">
4
- <img src="demo/nerf-mae_teaser.png" width="85%">
5
- <img src="demo/nerf-mae_teaser.jpeg" width="85%">
6
- </div>
7
- <!-- <p align="center">
8
- <img src="demo/nerf-mae_teaser.jpeg" width="100%">
9
- </p> -->
10
-
11
- <br>
12
- <div align="center">
13
-
14
- [![arXiv](https://img.shields.io/badge/arXiv-2404.01300-gray?style=for-the-badge&logo=arxiv&logoColor=white&color=B31B1B)](https://arxiv.org/abs/2404.01300)
15
- [![Project Page](https://img.shields.io/badge/Project-Page-orange?style=for-the-badge&logoColor=white&labelColor=gray&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://nerf-mae.github.io)
16
- [![Pytorch](https://img.shields.io/badge/Pytorch-%3E1.12-gray?style=for-the-badge&logo=pytorch&logoColor=white&labelColor=gray&color=ee4c2c&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://pytorch.org/)
17
- [![Cite](https://img.shields.io/badge/Cite-Bibtex-gray?style=for-the-badge&logoColor=white&color=F7A41D
18
- )](https://github.com/zubair-irshad/NeRF-MAE?tab=readme-ov-file#citation)
19
- [![Video](https://img.shields.io/badge/youtube-video-CD201F?style=for-the-badge&logo=youtube&labelColor=grey
20
- )](https://youtu.be/D60hlhmeuJI?si=d4RfHAwBJgLJXdKj)
21
-
22
-
23
-
24
- </div>
25
-
26
- ---
27
-
28
- <a href="https://www.tri.global/" target="_blank">
29
- <img align="right" src="demo/GeorgiaTech_RGB.png" width="18%"/>
30
- </a>
31
-
32
- <a href="https://www.tri.global/" target="_blank">
33
- <img align="right" src="demo/tri-logo.png" width="17%"/>
34
- </a>
35
-
36
- ### [Project Page](https://nerf-mae.github.io/) | [arXiv](https://arxiv.org/abs/2308.12967) | [PDF](https://arxiv.org/pdf/2308.12967.pdf)
37
-
38
-
39
-
40
- **NeRF-MAE : Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields**
41
-
42
- <a href="https://zubairirshad.com"><strong>Muhammad Zubair Irshad</strong></a>
43
- ·
44
- <a href="https://zakharos.github.io/"><strong>Sergey Zakharov</strong></a>
45
- ·
46
- <a href="https://www.linkedin.com/in/vitorguizilini"><strong>Vitor Guizilini</strong></a>
47
- ·
48
- <a href="https://adriengaidon.com/"><strong>Adrien Gaidon</strong></a>
49
- ·
50
- <a href="https://faculty.cc.gatech.edu/~zk15/"><strong>Zsolt Kira</strong></a>
51
- ·
52
- <a href="https://www.tri.global/about-us/dr-rares-ambrus"><strong>Rares Ambrus</strong></a>
53
- <br> **European Conference on Computer Vision, ECCV 2024**<br>
54
-
55
- <b> Toyota Research Institute &nbsp; | &nbsp; Georgia Institute of Technology</b>
56
-
57
- ## 💡 Highlights
58
- - **NeRF-MAE**: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
59
- - **NeRF-MAE Dataset**: A large-scale NeRF pretraining and downstream task finetuning dataset.
60
-
61
- ## 🏷️ TODO 🚀
62
-
63
- - [x] Release large-scale pretraining code 🚀
64
- - [x] Release NeRF-MAE dataset comprising radiance and density grids 🚀
65
- - [x] Release 3D object detection finetuning and eval code 🚀
66
- - [x] Pretrained NeRF-MAE checkpoints and out-of-the-box model usage 🚀
67
-
68
- ## NeRF-MAE Model Architecture
69
- <p align="center">
70
- <img src="demo/nerf-mae_architecture.jpg" width="90%">
71
- </p>
72
-
73
-
74
- ## Citation
75
-
76
- If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
77
-
78
- ```
79
- @inproceedings{irshad2024nerfmae,
80
- title={NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields},
81
- author={Muhammad Zubair Irshad and Sergey Zakharov and Vitor Guizilini and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
82
- booktitle={European Conference on Computer Vision (ECCV)},
83
- year={2024}
84
- }
85
- ```
86
-
87
- ### Contents
88
- - [🌇 Environment](#-environment)
89
- - [⛳ Model Usage and Checkpoints](#-model-usage-and-checkpoints)
90
- - [🗂️ Dataset](#-dataset)
91
-
92
- ## 🌇 Environment
93
-
94
- Create a python 3.7 virtual environment and install requirements:
95
-
96
- ```bash
97
- cd $NeRF-MAE repo
98
- conda create -n nerf-mae python=3.9
99
- conda activate nerf-mae
100
- pip install --upgrade pip
101
- pip install -r requirements.txt
102
- pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
103
- ```
104
- The code was built and tested on **cuda 11.3**
105
-
106
- Compile CUDA extension, to run downstream task finetuning, as described in [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN):
107
-
108
- ```bash
109
- cd $NeRF-MAE repo
110
- cd nerf_rpn/model/rotated_iou/cuda_op
111
- python setup.py install
112
- cd ../../../..
113
-
114
- ```
115
-
116
- ## ⛳ Model Usage and Checkpoints
117
-
118
- - [Hugginface repo to download pretrained and finetuned checkpoints](https://huggingface.co/mirshad7/NeRF-MAE)
119
-
120
- NeRF-MAE is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way.
121
-
122
- We have released pretrained and finetuned checkpoints to start using our codebase out-of-the-box. There are two types of usages. 1. Most common one is using the features directly in a downstream task such as an FPN head for 3D Object Detection and 2. Reconstruct the original grid for enforcing losses such as masked reconstruction loss. Below is a sample useage of our model with spelled out comments.
123
-
124
-
125
- 1. Get the features to be used in a downstream task
126
-
127
- ```
128
- import torch
129
-
130
- # Define Swin Transformer configurations
131
- swin_config = {
132
- "swin_t": {"embed_dim": 96, "depths": [2, 2, 6, 2], "num_heads": [3, 6, 12, 24]},
133
- "swin_s": {"embed_dim": 96, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
134
- "swin_b": {"embed_dim": 128, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
135
- "swin_l": {"embed_dim": 192, "depths": [2, 2, 18, 2], "num_heads": [6, 12, 24, 48]},
136
- }
137
-
138
- # Set the desired backbone type
139
- backbone_type = "swin_s"
140
- config = swin_config[backbone_type]
141
-
142
- # Initialize Swin Transformer model
143
- model = SwinTransformer_MAE3D_New(
144
- patch_size=[4, 4, 4],
145
- embed_dim=config["embed_dim"],
146
- depths=config["depths"],
147
- num_heads=config["num_heads"],
148
- window_size=[4, 4, 4],
149
- stochastic_depth_prob=0.1,
150
- expand_dim=True,
151
- resolution=resolution,
152
- )
153
-
154
- # Load checkpoint and remove unused layers
155
- checkpoint = torch.load(checkpoint_path, map_location="cpu")
156
- model.load_state_dict(checkpoint["state_dict"])
157
- for attr in ["decoder4", "decoder3", "decoder2", "decoder1", "out", "mask_token"]:
158
- delattr(model, attr)
159
-
160
- # Extract features using Swin Transformer backbone. input_grid has sample shape torch.randn((1, 4, 160, 160, 160))
161
- features = []
162
- input_grid = model.patch_partition(input_grid) + model.pos_embed.type_as(input_grid).to(input_grid.device).clone().detach()
163
- for stage in model.stages:
164
- input_grid = stage(input_grid)
165
- features.append(torch.permute(input_grid, [0, 4, 1, 2, 3]).contiguous()) # Format: [N, C, H, W, D]
166
-
167
- #Multi-scale features have shape: [torch.Size([1, 96, 40, 40, 40]), torch.Size([1, 192, 20, 20, 20]), torch.Size([1, 384, 10, 10, 10]), torch.Size([1, 768, 5, 5, 5])]
168
-
169
- # Process features through FPN
170
- ```
171
-
172
- 2. Get the Original Grid Output
173
- ```
174
- import torch
175
- # Load data from the specified folder and filename with the given resolution.
176
- res, rgbsigma = load_data(folder_name, filename, resolution=args.resolution)
177
-
178
- # rgbsigma has sample shape torch.randn((1, 4, 160, 160, 160))
179
-
180
- # Build the model using provided arguments.
181
- model = build_model(args)
182
-
183
- # Load checkpoint if provided.
184
- if args.checkpoint:
185
- model.load_state_dict(torch.load(args.checkpoint, map_location="cpu")["state_dict"])
186
- model.eval() # Set model to evaluation mode.
187
-
188
- # Run inference getting the features out for downsteam usage
189
- with torch.no_grad():
190
- pred = model([rgbsigma], is_eval=True)[3] # Extract only predictions.
191
-
192
- ```
193
-
194
- ### 1. How to plug these features for downstream 3D bounding detection from NeRFs (i.e. plug-and-play with a [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) OBB prediction head)
195
-
196
- Please also see the section on [Finetuning](#-finetuning). Our released finetuned checkpoint achieves state-of-the-art on 3D object detection in NeRFs. To run evaluation using our finetuned checkpoint on the dataset provided by NeRF-RPN, please run the below script, after updating the paths to the pretrained checkpoint i.e. --checkpoint and DATA_ROOT depending on evaluation done for ```Front3D``` or ```Scannet```:
197
-
198
- ```
199
- bash test_fcos_pretrained.sh
200
- ```
201
-
202
- Also see the cooresponding run file i.e. ```run_fcos_pretrained.py``` and our model adaptation i.e. ```SwinTransformer_FPN_Pretrained_Skip```. This is a minimal adaptation to plug and play our weights with a NeRF-RPN architecture and achieve significant boost in performance.
203
-
204
-
205
- ## 🗂️ Dataset
206
-
207
- Download the preprocessed datasets here.
208
-
209
- - Pretraining dataset (comprising NeRF radiance and density grids). [Download link](https://s3.amazonaws.com/tri-ml-public.s3.amazonaws.com/github/nerfmae/NeRF-MAE_pretrain.tar.gz)
210
- - Finetuning dataset (comprising NeRF radiance and density grids and bounding box/semantic labelling annotations). [3D Object Detection (Provided by NeRF-RPN)](https://drive.google.com/drive/folders/1q2wwLi6tSXu1hbEkMyfAKKdEEGQKT6pj), [3D Semantic Segmentation (Coming Soon)](), [Voxel-Super Resolution (Coming Soon)]()
211
-
212
-
213
- Extract pretraining and finetuning dataset under ```NeRF-MAE/datasets```. The directory structure should look like this:
214
-
215
- ```
216
- NeRF-MAE
217
- ├── pretrain
218
- │ ├── features
219
- │ └── nerfmae_split.npz
220
- └── finetune
221
- └── front3d_rpn_data
222
- ├── features
223
- ├── aabb
224
- └── obb
225
- ```
226
-
227
-
228
- Note: The above datasets are all you need to train and evaluate our method. Bonus: we will be releasing our multi-view rendered posed RGB images from FRONT3D, HM3D and Hypersim as well as Instant-NGP trained checkpoints soon (these comprise over 1M+ images and 3k+ NeRF checkpoints)
229
-
230
- Please note that our dataset was generated using the instruction from [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) and [3D-CLR](https://vis-www.cs.umass.edu/3d-clr/). Please consider citing our work, NeRF-RPN and 3D-CLR if you find this dataset useful in your research.
231
-
232
- Please also note that our dataset uses [Front3D](https://arxiv.org/abs/2011.09127), [Habitat-Matterport3D](https://arxiv.org/abs/2109.08238), [HyperSim](https://github.com/apple/ml-hypersim) and [ScanNet](https://www.scan-net.org/) as the base version of the dataset i.e. we train a NeRF per scene and extract radiance and desnity grid as well as aligned NeRF-grid 3D annotations. Please read the term of use for each dataset if you want to utilize the posed multi-view images for each of these datasets.
233
-
234
- ### For More details, please checkout out Paper, Github and Project Page!
235
-
236
- ---
237
- license: cc-by-nc-4.0
238
- ---
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: feature-extraction
3
+ ---
4
+
5
+ <!-- # NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields -->
6
+
7
+ <div align="center">
8
+ <img src="demo/nerf-mae_teaser.png" width="85%">
9
+ <img src="demo/nerf-mae_teaser.jpeg" width="85%">
10
+ </div>
11
+ <!-- <p align="center">
12
+ <img src="demo/nerf-mae_teaser.jpeg" width="100%">
13
+ </p> -->
14
+
15
+ <br>
16
+ <div align="center">
17
+
18
+ [![arXiv](https://img.shields.io/badge/arXiv-2404.01300-gray?style=for-the-badge&logo=arxiv&logoColor=white&color=B31B1B)](https://arxiv.org/abs/2404.01300)
19
+ [![Project Page](https://img.shields.io/badge/Project-Page-orange?style=for-the-badge&logoColor=white&labelColor=gray&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://nerf-mae.github.io)
20
+ [![Pytorch](https://img.shields.io/badge/Pytorch-%3E1.12-gray?style=for-the-badge&logo=pytorch&logoColor=white&labelColor=gray&color=ee4c2c&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://pytorch.org/)
21
+ [![Cite](https://img.shields.io/badge/Cite-Bibtex-gray?style=for-the-badge&logoColor=white&color=F7A41D
22
+ )](https://github.com/zubair-irshad/NeRF-MAE?tab=readme-ov-file#citation)
23
+ [![Video](https://img.shields.io/badge/youtube-video-CD201F?style=for-the-badge&logo=youtube&labelColor=grey
24
+ )](https://youtu.be/D60hlhmeuJI?si=d4RfHAwBJgLJXdKj)
25
+
26
+
27
+
28
+ </div>
29
+
30
+ ---
31
+
32
+ <a href="https://www.tri.global/" target="_blank">
33
+ <img align="right" src="demo/GeorgiaTech_RGB.png" width="18%"/>
34
+ </a>
35
+
36
+ <a href="https://www.tri.global/" target="_blank">
37
+ <img align="right" src="demo/tri-logo.png" width="17%"/>
38
+ </a>
39
+
40
+ ### [Project Page](https://nerf-mae.github.io/) | [arXiv](https://arxiv.org/abs/2308.12967) | [PDF](https://arxiv.org/pdf/2308.12967.pdf)
41
+
42
+
43
+
44
+ **NeRF-MAE : Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields**
45
+
46
+ <a href="https://zubairirshad.com"><strong>Muhammad Zubair Irshad</strong></a>
47
+ ·
48
+ <a href="https://zakharos.github.io/"><strong>Sergey Zakharov</strong></a>
49
+ ·
50
+ <a href="https://www.linkedin.com/in/vitorguizilini"><strong>Vitor Guizilini</strong></a>
51
+ ·
52
+ <a href="https://adriengaidon.com/"><strong>Adrien Gaidon</strong></a>
53
+ ·
54
+ <a href="https://faculty.cc.gatech.edu/~zk15/"><strong>Zsolt Kira</strong></a>
55
+ ·
56
+ <a href="https://www.tri.global/about-us/dr-rares-ambrus"><strong>Rares Ambrus</strong></a>
57
+ <br> **European Conference on Computer Vision, ECCV 2024**<br>
58
+
59
+ <b> Toyota Research Institute &nbsp; | &nbsp; Georgia Institute of Technology</b>
60
+
61
+ ## 💡 Highlights
62
+ - **NeRF-MAE**: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
63
+ - **NeRF-MAE Dataset**: A large-scale NeRF pretraining and downstream task finetuning dataset.
64
+
65
+ ## 🏷️ TODO 🚀
66
+
67
+ - [x] Release large-scale pretraining code 🚀
68
+ - [x] Release NeRF-MAE dataset comprising radiance and density grids 🚀
69
+ - [x] Release 3D object detection finetuning and eval code 🚀
70
+ - [x] Pretrained NeRF-MAE checkpoints and out-of-the-box model usage 🚀
71
+
72
+ ## NeRF-MAE Model Architecture
73
+ <p align="center">
74
+ <img src="demo/nerf-mae_architecture.jpg" width="90%">
75
+ </p>
76
+
77
+
78
+ ## Citation
79
+
80
+ If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
81
+
82
+ ```
83
+ @inproceedings{irshad2024nerfmae,
84
+ title={NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields},
85
+ author={Muhammad Zubair Irshad and Sergey Zakharov and Vitor Guizilini and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
86
+ booktitle={European Conference on Computer Vision (ECCV)},
87
+ year={2024}
88
+ }
89
+ ```
90
+
91
+ ### Contents
92
+ - [🌇 Environment](#-environment)
93
+ - [⛳ Model Usage and Checkpoints](#-model-usage-and-checkpoints)
94
+ - [🗂️ Dataset](#-dataset)
95
+
96
+ ## 🌇 Environment
97
+
98
+ Create a python 3.7 virtual environment and install requirements:
99
+
100
+ ```bash
101
+ cd $NeRF-MAE repo
102
+ conda create -n nerf-mae python=3.9
103
+ conda activate nerf-mae
104
+ pip install --upgrade pip
105
+ pip install -r requirements.txt
106
+ pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
107
+ ```
108
+ The code was built and tested on **cuda 11.3**
109
+
110
+ Compile CUDA extension, to run downstream task finetuning, as described in [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN):
111
+
112
+ ```bash
113
+ cd $NeRF-MAE repo
114
+ cd nerf_rpn/model/rotated_iou/cuda_op
115
+ python setup.py install
116
+ cd ../../../..
117
+
118
+ ```
119
+
120
+ ## Model Usage and Checkpoints
121
+
122
+ - [Hugginface repo to download pretrained and finetuned checkpoints](https://huggingface.co/mirshad7/NeRF-MAE)
123
+
124
+ NeRF-MAE is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way.
125
+
126
+ We have released pretrained and finetuned checkpoints to start using our codebase out-of-the-box. There are two types of usages. 1. Most common one is using the features directly in a downstream task such as an FPN head for 3D Object Detection and 2. Reconstruct the original grid for enforcing losses such as masked reconstruction loss. Below is a sample useage of our model with spelled out comments.
127
+
128
+
129
+ 1. Get the features to be used in a downstream task
130
+
131
+ ```python
132
+ import torch
133
+
134
+ # Define Swin Transformer configurations
135
+ swin_config = {
136
+ "swin_t": {"embed_dim": 96, "depths": [2, 2, 6, 2], "num_heads": [3, 6, 12, 24]},
137
+ "swin_s": {"embed_dim": 96, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
138
+ "swin_b": {"embed_dim": 128, "depths": [2, 2, 18, 2], "num_heads": [3, 6, 12, 24]},
139
+ "swin_l": {"embed_dim": 192, "depths": [2, 2, 18, 2], "num_heads": [6, 12, 24, 48]},
140
+ }
141
+
142
+ # Set the desired backbone type
143
+ backbone_type = "swin_s"
144
+ config = swin_config[backbone_type]
145
+
146
+ # Initialize Swin Transformer model
147
+ model = SwinTransformer_MAE3D_New(
148
+ patch_size=[4, 4, 4],
149
+ embed_dim=config["embed_dim"],
150
+ depths=config["depths"],
151
+ num_heads=config["num_heads"],
152
+ window_size=[4, 4, 4],
153
+ stochastic_depth_prob=0.1,
154
+ expand_dim=True,
155
+ resolution=resolution,
156
+ )
157
+
158
+ # Load checkpoint and remove unused layers
159
+ checkpoint_path = hf_hub_download(repo_id="mirshad7/NeRF-MAE", filename="nerf_mae_pretrained.pt")
160
+ checkpoint = torch.load(checkpoint_path, map_location="cpu")
161
+ model.load_state_dict(checkpoint["state_dict"])
162
+ for attr in ["decoder4", "decoder3", "decoder2", "decoder1", "out", "mask_token"]:
163
+ delattr(model, attr)
164
+
165
+ # Extract features using Swin Transformer backbone. input_grid has sample shape torch.randn((1, 4, 160, 160, 160))
166
+ features = []
167
+ input_grid = model.patch_partition(input_grid) + model.pos_embed.type_as(input_grid).to(input_grid.device).clone().detach()
168
+ for stage in model.stages:
169
+ input_grid = stage(input_grid)
170
+ features.append(torch.permute(input_grid, [0, 4, 1, 2, 3]).contiguous()) # Format: [N, C, H, W, D]
171
+
172
+ #Multi-scale features have shape: [torch.Size([1, 96, 40, 40, 40]), torch.Size([1, 192, 20, 20, 20]), torch.Size([1, 384, 10, 10, 10]), torch.Size([1, 768, 5, 5, 5])]
173
+
174
+ # Process features through FPN
175
+ ```
176
+
177
+ 2. Get the Original Grid Output
178
+ ```python
179
+ import torch
180
+ # Load data from the specified folder and filename with the given resolution.
181
+ res, rgbsigma = load_data(folder_name, filename, resolution=args.resolution)
182
+
183
+ # rgbsigma has sample shape torch.randn((1, 4, 160, 160, 160))
184
+
185
+ # Build the model using provided arguments.
186
+ model = build_model(args)
187
+
188
+ # Load checkpoint if provided.
189
+ if args.checkpoint:
190
+ model.load_state_dict(torch.load(args.checkpoint, map_location="cpu")["state_dict"])
191
+ model.eval() # Set model to evaluation mode.
192
+
193
+ # Run inference getting the features out for downsteam usage
194
+ with torch.no_grad():
195
+ pred = model([rgbsigma], is_eval=True)[3] # Extract only predictions.
196
+
197
+ ```
198
+
199
+ ### 1. How to plug these features for downstream 3D bounding detection from NeRFs (i.e. plug-and-play with a [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) OBB prediction head)
200
+
201
+ Please also see the section on [Finetuning](#-finetuning). Our released finetuned checkpoint achieves state-of-the-art on 3D object detection in NeRFs. To run evaluation using our finetuned checkpoint on the dataset provided by NeRF-RPN, please run the below script, after updating the paths to the pretrained checkpoint i.e. --checkpoint and DATA_ROOT depending on evaluation done for ```Front3D``` or ```Scannet```:
202
+
203
+ ```
204
+ bash test_fcos_pretrained.sh
205
+ ```
206
+
207
+ Also see the cooresponding run file i.e. ```run_fcos_pretrained.py``` and our model adaptation i.e. ```SwinTransformer_FPN_Pretrained_Skip```. This is a minimal adaptation to plug and play our weights with a NeRF-RPN architecture and achieve significant boost in performance.
208
+
209
+
210
+ ## 🗂️ Dataset
211
+
212
+ Download the preprocessed datasets here.
213
+
214
+ - Pretraining dataset (comprising NeRF radiance and density grids). [Download link](https://s3.amazonaws.com/tri-ml-public.s3.amazonaws.com/github/nerfmae/NeRF-MAE_pretrain.tar.gz)
215
+ - Finetuning dataset (comprising NeRF radiance and density grids and bounding box/semantic labelling annotations). [3D Object Detection (Provided by NeRF-RPN)](https://drive.google.com/drive/folders/1q2wwLi6tSXu1hbEkMyfAKKdEEGQKT6pj), [3D Semantic Segmentation (Coming Soon)](), [Voxel-Super Resolution (Coming Soon)]()
216
+
217
+
218
+ Extract pretraining and finetuning dataset under ```NeRF-MAE/datasets```. The directory structure should look like this:
219
+
220
+ ```
221
+ NeRF-MAE
222
+ ├── pretrain
223
+ ├── features
224
+ └── nerfmae_split.npz
225
+ └── finetune
226
+ └── front3d_rpn_data
227
+ ├── features
228
+ ├── aabb
229
+ └── obb
230
+ ```
231
+
232
+
233
+ Note: The above datasets are all you need to train and evaluate our method. Bonus: we will be releasing our multi-view rendered posed RGB images from FRONT3D, HM3D and Hypersim as well as Instant-NGP trained checkpoints soon (these comprise over 1M+ images and 3k+ NeRF checkpoints)
234
+
235
+ Please note that our dataset was generated using the instruction from [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) and [3D-CLR](https://vis-www.cs.umass.edu/3d-clr/). Please consider citing our work, NeRF-RPN and 3D-CLR if you find this dataset useful in your research.
236
+
237
+ Please also note that our dataset uses [Front3D](https://arxiv.org/abs/2011.09127), [Habitat-Matterport3D](https://arxiv.org/abs/2109.08238), [HyperSim](https://github.com/apple/ml-hypersim) and [ScanNet](https://www.scan-net.org/) as the base version of the dataset i.e. we train a NeRF per scene and extract radiance and desnity grid as well as aligned NeRF-grid 3D annotations. Please read the term of use for each dataset if you want to utilize the posed multi-view images for each of these datasets.
238
+
239
+ ### For More details, please checkout out Paper, Github and Project Page!
240
+
241
+ ---
242
+ license: cc-by-nc-4.0
243
+ ---