mirshad7 commited on
Commit
5f1484d
·
1 Parent(s): 76bc974

initial commit

Browse files
README.md CHANGED
@@ -1,3 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
1
+ <!-- # NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields -->
2
+
3
+ <div align="center">
4
+ <img src="demo/nerf-mae_teaser.png" width="85%">
5
+ <img src="demo/nerf-mae_teaser.jpeg" width="85%">
6
+ </div>
7
+ <!-- <p align="center">
8
+ <img src="demo/nerf-mae_teaser.jpeg" width="100%">
9
+ </p> -->
10
+
11
+ <br>
12
+ <div align="center">
13
+
14
+ [![arXiv](https://img.shields.io/badge/arXiv-2404.01300-gray?style=for-the-badge&logo=arxiv&logoColor=white&color=B31B1B)](https://arxiv.org/abs/2404.01300)
15
+ [![Project Page](https://img.shields.io/badge/Project-Page-orange?style=for-the-badge&logoColor=white&labelColor=gray&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://nerf-mae.github.io)
16
+ [![Pytorch](https://img.shields.io/badge/Pytorch-%3E1.12-gray?style=for-the-badge&logo=pytorch&logoColor=white&labelColor=gray&color=ee4c2c&link=https%3A%2F%2Fnerf-mae.github.io%2F)](https://pytorch.org/)
17
+ [![Cite](https://img.shields.io/badge/Cite-Bibtex-gray?style=for-the-badge&logoColor=white&color=F7A41D
18
+ )](https://github.com/zubair-irshad/NeRF-MAE?tab=readme-ov-file#citation)
19
+ [![Video](https://img.shields.io/badge/youtube-video-CD201F?style=for-the-badge&logo=youtube&labelColor=grey
20
+ )](https://youtu.be/D60hlhmeuJI?si=d4RfHAwBJgLJXdKj)
21
+
22
+
23
+
24
+ </div>
25
+
26
+ ---
27
+
28
+ <a href="https://www.tri.global/" target="_blank">
29
+ <img align="right" src="demo/GeorgiaTech_RGB.png" width="18%"/>
30
+ </a>
31
+
32
+ <a href="https://www.tri.global/" target="_blank">
33
+ <img align="right" src="demo/tri-logo.png" width="17%"/>
34
+ </a>
35
+
36
+ ### [Project Page](https://nerf-mae.github.io/) | [arXiv](https://arxiv.org/abs/2308.12967) | [PDF](https://arxiv.org/pdf/2308.12967.pdf)
37
+
38
+
39
+
40
+ **NeRF-MAE : Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields**
41
+
42
+ <a href="https://zubairirshad.com"><strong>Muhammad Zubair Irshad</strong></a>
43
+ ·
44
+ <a href="https://zakharos.github.io/"><strong>Sergey Zakharov</strong></a>
45
+ ·
46
+ <a href="https://www.linkedin.com/in/vitorguizilini"><strong>Vitor Guizilini</strong></a>
47
+ ·
48
+ <a href="https://adriengaidon.com/"><strong>Adrien Gaidon</strong></a>
49
+ ·
50
+ <a href="https://faculty.cc.gatech.edu/~zk15/"><strong>Zsolt Kira</strong></a>
51
+ ·
52
+ <a href="https://www.tri.global/about-us/dr-rares-ambrus"><strong>Rares Ambrus</strong></a>
53
+ <br> **European Conference on Computer Vision, ECCV 2024**<br>
54
+
55
+ <b> Toyota Research Institute &nbsp; | &nbsp; Georgia Institute of Technology</b>
56
+
57
+ ## 💡 Highlights
58
+ - **NeRF-MAE**: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
59
+ - **NeRF-MAE Dataset**: A large-scale NeRF pretraining and downstream task finetuning dataset.
60
+
61
+ ## Citation
62
+
63
+ If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
64
+
65
+ ```
66
+ @inproceedings{irshad2024nerfmae,
67
+ title={NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields},
68
+ author={Muhammad Zubair Irshad and Sergey Zakharov and Vitor Guizilini and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
69
+ booktitle={European Conference on Computer Vision (ECCV)},
70
+ year={2024}
71
+ }
72
+ ```
73
+
74
+ ### Contents
75
+ - [🌇 Environment](#-environment)
76
+ - [⛳ Model Usage and Checkpoints](#-model-usage-and-checkpoints)
77
+ - [🗂️ Dataset](#-dataset)
78
+
79
+ ## 🌇 Environment
80
+
81
+ Create a python 3.7 virtual environment and install requirements:
82
+
83
+ ```bash
84
+ cd $NeRF-MAE repo
85
+ conda create -n nerf-mae python=3.9
86
+ conda activate nerf-mae
87
+ pip install --upgrade pip
88
+ pip install -r requirements.txt
89
+ pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
90
+ ```
91
+ The code was built and tested on **cuda 11.3**
92
+
93
+ Compile CUDA extension, to run downstream task finetuning, as described in [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN):
94
+
95
+ ```bash
96
+ cd $NeRF-MAE repo
97
+ cd nerf_rpn/model/rotated_iou/cuda_op
98
+ python setup.py install
99
+ cd ../../../..
100
+
101
+ ```
102
+
103
+ ## ⛳ Model Usage and Checkpoints
104
+
105
+ NeRF-MAE is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way.
106
+
107
+ We have released 1.pretrained and 2. finetuned checkpoints to start using our codebase out-of-the-box. Below is a sample useage of our model with spelled out comments in a few lines of code:
108
+
109
+ ```
110
+ import torch
111
+ # Load data from the specified folder and filename with the given resolution.
112
+ res, rgbsigma = load_data(folder_name, filename, resolution=args.resolution)
113
+
114
+ # Build the model using provided arguments.
115
+ model = build_model(args)
116
+
117
+ # Load checkpoint if provided.
118
+ if args.checkpoint:
119
+ model.load_state_dict(torch.load(args.checkpoint, map_location="cpu")["state_dict"])
120
+ model.eval() # Set model to evaluation mode.
121
+
122
+ # Run inference getting the features out for downsteam usage
123
+ with torch.no_grad():
124
+ pred = model([rgbsigma], is_eval=True)[3] # Extract only predictions.
125
+
126
+ ```
127
+
128
+ ### 1. How to plug these features for downstream 3D bounding detection from NeRFs (i.e. plug-and-play with a [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) OBB prediction head)
129
+
130
+ Please also see the section on [Finetuning](#-finetuning). Our released finetuned checkpoint achieves state-of-the-art on 3D object detection in NeRFs. To run evaluation using our finetuned checkpoint on the dataset provided by NeRF-RPN, please run the below script, after updating the paths to the pretrained checkpoint i.e. --checkpoint and DATA_ROOT depending on evaluation done for ```Front3D``` or ```Scannet```:
131
+
132
+ ```
133
+ bash test_fcos_pretrained.sh
134
+ ```
135
+
136
+ Also see the cooresponding run file i.e. ```run_fcos_pretrained.py``` and our model adaptation i.e. ```SwinTransformer_FPN_Pretrained_Skip```. This is a minimal adaptation to plug and play our weights with a NeRF-RPN architecture and achieve significant boost in performance.
137
+
138
+
139
+ ## 🗂️ Dataset
140
+
141
+ Download the preprocessed datasets here.
142
+
143
+ - Pretraining dataset (comprising NeRF radiance and density grids). [Download link](https://s3.amazonaws.com/tri-ml-public.s3.amazonaws.com/github/nerfmae/NeRF-MAE_pretrain.tar.gz)
144
+ - Finetuning dataset (comprising NeRF radiance and density grids and bounding box/semantic labelling annotations). [3D Object Detection (Provided by NeRF-RPN)](https://drive.google.com/drive/folders/1q2wwLi6tSXu1hbEkMyfAKKdEEGQKT6pj), [3D Semantic Segmentation (Coming Soon)](), [Voxel-Super Resolution (Coming Soon)]()
145
+
146
+
147
+ Extract pretraining and finetuning dataset under ```NeRF-MAE/datasets```. The directory structure should look like this:
148
+
149
+ ```
150
+ NeRF-MAE
151
+ ├── pretrain
152
+ │ ├── features
153
+ │ └── nerfmae_split.npz
154
+ └── finetune
155
+ └── front3d_rpn_data
156
+ ├── features
157
+ ├── aabb
158
+ └── obb
159
+ ```
160
+
161
+
162
+ Note: The above datasets are all you need to train and evaluate our method. Bonus: we will be releasing our multi-view rendered posed RGB images from FRONT3D, HM3D and Hypersim as well as Instant-NGP trained checkpoints soon (these comprise over 1M+ images and 3k+ NeRF checkpoints)
163
+
164
+ Please note that our dataset was generated using the instruction from [NeRF-RPN](https://github.com/lyclyc52/NeRF_RPN) and [3D-CLR](https://vis-www.cs.umass.edu/3d-clr/). Please consider citing our work, NeRF-RPN and 3D-CLR if you find this dataset useful in your research.
165
+
166
+ Please also note that our dataset uses [Front3D](https://arxiv.org/abs/2011.09127), [Habitat-Matterport3D](https://arxiv.org/abs/2109.08238), [HyperSim](https://github.com/apple/ml-hypersim) and [ScanNet](https://www.scan-net.org/) as the base version of the dataset i.e. we train a NeRF per scene and extract radiance and desnity grid as well as aligned NeRF-grid 3D annotations. Please read the term of use for each dataset if you want to utilize the posed multi-view images for each of these datasets.
167
+
168
+ ### For More details, please checkout out Paper, Github and Project Page!
169
+
170
  ---
171
  license: cc-by-nc-4.0
172
  ---
demo/.DS_Store ADDED
Binary file (6.15 kB). View file
 
demo/GeorgiaTech_RGB.png ADDED
demo/comparison_mae.jpeg ADDED
demo/nerf-mae_architecture.jpg ADDED
demo/nerf-mae_teaser.jpeg ADDED
demo/nerf-mae_teaser.png ADDED
demo/tri-logo.png ADDED
front3d_obb_finetuned.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7f0dee865f8629eee475cb87643c6af67fddefed8c9df9bf6d9414cd761e269
3
+ size 314063759
nerf_mae_pretrained.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:713abb59691adbcc16e805cf6e42d8360e79113f0fd82ed8a8514350a7b60e05
3
+ size 305663555
scannet_obb_finetuned.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bef943ef054ef0904e9fa5a0075a3f76be5274bf7dff11699b5a6f741dc987fa
3
+ size 314063759