yslan's picture
init
7f51798
|
raw
history blame
4.28 kB

GaussianAnything: arXiv 2024

setup the environment (the same env as LN3Diff)

conda create -n ga python=3.10
conda activate ga
pip intall -r requrements.txt # will install the surfel Gaussians environments automatically.

Then, install pytorch3d with

pip install git+https://github.com/facebookresearch/pytorch3d.git@stable

:dromedary_camel: TODO

  • Release inference code and checkpoints.
  • Release Training code.
  • Release pre-extracted latent codes for 3D diffusion training.
  • Release Gradio Demo.
  • Release the evaluation code.
  • Lint the code.

Inference

Be aware to change the $logdir in the bash file accordingly.

To load the checkpoint automatically: please replace /mnt/sfs-common/yslan/open-source with yslan/GaussianAnything/ckpts/checkpoints.

Text-2-3D:

Please update the caption for 3D generation in datasets/caption-forpaper.txt. T o change the number of samples to be generated, please change $num_samples in the bash file.

stage-1:

bash shell_scripts/release/inference/t23d/stage1-t23d.sh

then, set the $stage_1_output_dir to the $logdir of the above stage.

stage-2:

bash shell_scripts/release/inference/t23d/stage2-t23d.sh

The results will be dumped to ./logs/t23d/stage-2

I23D (requires two stage generation):

set the $data_dir accordingly. For some demo image, please download from huggingfac.co/yslan/GaussianAnything/demo-img.

stage-1:

bash shell_scripts/release/inference/i23d/i23d-stage1.sh

then, set the $stage_1_output_dir to the $logdir of the above stage.

stage-2:

bash shell_scripts/release/inference/i23d/i23d-stage1.sh

3D VAE Reconstruction:

To encode a 3D asset into the latent point cloud, please download the pre-trained VAE checkpoint from huggingfac.co/yslan/gaussiananything/ckpts/vae/model_rec1965000.pt to ./checkpoint/model_rec1965000.pt.

Then, run the inference script

bash shell_scripts/release/inference/vae-3d.sh

This will encode the mulit-view 3D renderings in ./assets/demo-image-for-i23d/for-vae-reconstruction/Animals/0 into the point-cloud structured latent code, and export them (along with the 2dgs mesh) in ./logs/latent_dir/. The exported latent code will be used for efficient 3D diffusion training.

Training (Flow Matching 3D Generation)

All the training is conducted on 8 A100 (80GiB) with BF16 enabled. For training on V100, please use FP32 training by setting --use_amp False in the bash file. Feel free to tune the $batch_size in the bash file accordingly to match your VRAM.

To facilitate reproducing the performance, we have uploaded the pre-extracted poind cloud-structured latent codes to the huggingfac.co/yslan/gaussiananything/dataset/latent.tar.gz (34GiB required). Please download the pre extracted point cloud latent codes, unzip and set the $mv_latent_dir in the bash file accordingly.

Text to 3D:

Please donwload the 3D caption from hugging face huggingfac.co/yslan/GaussianAnything/dataset/text_captions_3dtopia.json, and put it under dataset.

Note that if you want to train a specific class of Objaverse, just manually change the code at datasets/g_buffer_objaverse.py:3043.

stage-1 training (point cloud generation):

bash shell_scripts/release/train/stage2-t23d/t23d-pcd-gen.sh

stage-2 training (point cloud-conditioned KL feature generation):

bash shell_scripts/release/train/stage2-t23d/t23d-klfeat-gen.sh

(single-view) Image to 3D

Please download g-buffer dataset first.

stage-1 training (point cloud generation):

bash shell_scripts/release/train/stage2-i23d/i23d-pcd-gen.sh

stage-2 training (point cloud-conditioned KL feature generation):

bash shell_scripts/release/train/stage2-i23d/i23d-klfeat-gen.sh