File size: 4,256 Bytes
9b1f90c
 
 
21ebe91
 
 
 
 
 
 
0629b32
21ebe91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d614e87
21ebe91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d614e87
21ebe91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---

license: apache-2.0
---



## L4GM: Large 4D Gaussian Reconstruction Model
<p align="center">
    <img src="assets/teaser.jpg">

</p>


[**Paper**](https://arxiv.org/abs/2406.10324) | [**Project Page**](https://research.nvidia.com/labs/toronto-ai/l4gm/) | [**Code**](https://github.com/nv-tlabs/L4GM-official)

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.

---
### Model Overview

#### Description:
L4GM is a 4D reconstruction model that reconstructs a sequence of 3D Gaussians from a monocular input video within seconds. An additional interpolation model helps to interpolate the 3D Gaussian sequence and thus enhance the framerate.

This model is for research and development only. <br>

#### Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA [ashawkey/LGM](https://huggingface.co/ashawkey/LGM).

##### License/Terms of Use: 

CC-BY-NC-SA-4.0

#### References:
- [Original paper](https://arxiv.org/abs/2406.10324)
- [NVIDIA model implementation in GitHub](https://github.com/nv-tlabs/L4GM-official)
- [LGM model card](https://huggingface.co/ashawkey/LGM)
<br> 

#### Model Architecture: 
<p align="center">
    <img src="assets/method.png">

</p>


The L4GM model is a modified version of the origin [LGM model](https://huggingface.co/ashawkey/LGM). 

The difference is that L4GM has additional temporal attention modules following every cross-view attention module in the asymmetric U-Net. 

The difference makes L4GM able to aggregate the temporal information across frames.

The model is initialized from the pretrained LGM weight, and trained on a multi-view dynamic object dataset. 

The interpolation model shares the same model architecture with L4GM but is trained with an interpolation objective. <br>

#### Input:
**Input Type(s):** Video <br>
**Input Format(s):** RGB sequence <br>
**Input Parameters:** 4D <br>
**Other Properties Related to Input:** Input resolution is 256x256. Video length in one forward pass is usually 16 frames. <br>

#### Output:
**Output Type(s):** 3D GS sequence <br>
**Output Format:** frame legnth x Gaussians-per-frame <br>
**Output Parameters:** 3D <br>
**Other Properties Related to Output:** 65,536 3D Gaussians per frame. <br> 


**Supported Hardware Microarchitecture Compatibility:** <br>
* NVIDIA Ampere <br>

**Preferred Operating System(s):** <br>
* Linux <br>

#### Model Version(s): 
v1.0 <br>

### Training, Testing, and Evaluation Datasets: 

#### Training Dataset:
**Link:** [Objaverse](https://huggingface.co/datasets/allenai/objaverse) <br>
**Data Collection Method by dataset:** Unkown <br>
**Labeling Method by dataset:** Unknown <br>
**Properties:** We use 110K object animation data, which is a subset from the Objaverse. We filter the Objaverse dataset by the motion magnitude. We render the animations from 48 cameras that produces 12M videos in total. <br>
**Dataset License(s):** [Objaverse License](https://huggingface.co/datasets/allenai/objaverse#license) <br>


#### Inference:
**Engine:** PyTorch
**Test Hardware:** A100 <br>

#### Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.  

##### Citation
```bib

@inproceedings{ren2024l4gm,

    title={L4GM: Large 4D Gaussian Reconstruction Model}, 

    author={Jiawei Ren and Kevin Xie and Ashkan Mirzaei and Hanxue Liang and Xiaohui Zeng and Karsten Kreis and Ziwei Liu and Antonio Torralba and Sanja Fidler and Seung Wook Kim and Huan Ling},

    booktitle={Proceedings of Neural Information Processing Systems(NeurIPS)},

    month = {Dec},

    year={2024}

}

```