arxiv:2411.08033

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Published on Nov 12

· Submitted by

yslan on Nov 18

Upvote

Authors:

Yushi Lan ,

Shangchen Zhou ,

Zhaoyang Lyu ,

Fangzhou Hong ,

Bo Dai ,

Xingang Pan ,

Abstract

While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

View arXiv page View PDF Add to collection

Community

yslan

Paper author Paper submitter 1 day ago

GaussianAnything generates high-quality and editable surfel Gaussians through a cascaded native 3D diffusion pipeline, given single-view images or texts as the conditions.

yslan

Paper author Paper submitter 1 day ago

Github: https://github.com/NIRVANALAN/GaussianAnything

astackofpaper

1 day ago

•

edited 1 day ago

Hello there! I made the fish in your image example. It's under CC by so I would appreciate a credit. :)

https://poly.pizza/m/8DXeKkgTS_s

yslan

Paper author 1 day ago

Thanks for your fabulous 3D assets shared online for the public use! I will update the project page and paper accordingly to credit your asset in the later version ;)