vfusion3d / README.md

Add transformers metadata (#1)

0aeeaac verified 3 months ago

4.31 kB

	---
	license: cc-by-nc-2.0
	pipeline_tag: image-to-3d
	library_name: transformers
	---
	# [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models

	[Porject page](https://junlinhan.github.io/projects/vfusion3d.html), [Paper link](https://arxiv.org/abs/2403.12034)

	VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation.

	[VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](https://junlinhan.github.io/projects/vfusion3d.html)<br>
	[Junlin Han](https://junlinhan.github.io/), [Filippos Kokkinos](https://www.fkokkinos.com/), [Philip Torr](https://www.robots.ox.ac.uk/~phst/)<br>
	GenAI, Meta and TVG, University of Oxford<br>
	European Conference on Computer Vision (ECCV), 2024


	## News

	- [08.08.2024] [HF Demo](https://huggingface.co/spaces/facebook/VFusion3D) is available, big thanks to [Jade Choghari](https://github.com/jadechoghari)'s help for making it possible.
	- [25.07.2024] Release weights and inference code for VFusion3D.



	## Quick Start

	Getting started with VFusion3D is super easy! 🤗 Here’s how you can use the model with Hugging Face:

	### Install Dependencies (Optional)

	Depending on your needs, you may want to enable specific features like mesh generation or video rendering. We've got you covered with these additional packages:

	```bash
	!pip --quiet install imageio[ffmpeg] PyMCubes trimesh rembg[gpu,cli] kiui
	```

	### Load model directly
	```python
	import torch
	from transformers import AutoModel, AutoProcessor

	# load the model and processor
	model = AutoModel.from_pretrained("jadechoghari/vfusion3d", trust_remote_code=True)
	processor = AutoProcessor.from_pretrained("jadechoghari/vfusion3d")

	# download and preprocess the image
	import requests
	from PIL import Image
	from io import BytesIO

	image_url = 'https://sm.ign.com/ign_nordic/cover/a/avatar-gen/avatar-generations_prsz.jpg'
	response = requests.get(image_url)
	image = Image.open(BytesIO(response.content))

	# preprocess the image and get the source camera
	image, source_camera = processor(image)


	# generate planes (default output)
	output_planes = model(image, source_camera)
	print("Planes shape:", output_planes.shape)

	# generate a 3D mesh
	output_planes, mesh_path = model(image, source_camera, export_mesh=True)
	print("Planes shape:", output_planes.shape)
	print("Mesh saved at:", mesh_path)

	# Generate a video
	output_planes, video_path = model(image, source_camera, export_video=True)
	print("Planes shape:", output_planes.shape)
	print("Video saved at:", video_path)

	```
	- Default (Planes): By default, VFusion3D outputs planes—ideal for further 3D operations.
	- Export Mesh: Want a 3D mesh? Just set `export_mesh=True`, and you'll get a `.obj` file ready to roll. You can also customize the mesh resolution by adjusting the `mesh_size` parameter.
	- Export Video: Fancy a 3D video? Set `export_video=True`, and you'll receive a beautifully rendered video from multiple angles. You can tweak `render_size` and `fps` to get the video just right.

	Check out our [demo app](https://huggingface.co/spaces/facebook/VFusion3D) to see VFusion3D in action! 🤗

	## Results and Comparisons

	### 3D Generation Results
	<img src='assets/gif1.gif' width=950>

	<img src='assets/gif2.gif' width=950>

	### User Study Results
	<img src='assets/user.png' width=950>



	## Acknowledgement

	- This inference code of VFusion3D heavily borrows from [OpenLRM](https://github.com/3DTopia/OpenLRM).

	## Citation

	If you find this work useful, please cite us:


	```
	@article{han2024vfusion3d,
	title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models},
	author={Junlin Han and Filippos Kokkinos and Philip Torr},
	journal={European Conference on Computer Vision (ECCV)},
	year={2024}
	}
	```

	## License

	- The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license.
	- The model weights of VFusion3D is also licensed under CC-BY-NC.