ZeroGPU environment debug

#3
by cbensimon HF staff - opened

Hi @bluestyle97 !

I got the demo kind of working on ZeroGPU environment (currently open3d is creating the error by calling CUDA at import time: https://github.com/isl-org/Open3D/blob/81201f9590aec0d7bb41fdd925e63adb89ef8768/python/open3d/__init__.py#L66). Disabling the right line in open3d codebase makes the Space work but I wonder if the timing figures that I get are coherent to what is expected:

Generate Gaussians: 0.05 seconds.
Estimate poses: 20.34 seconds.
Generate video: 8.83 seconds.
Generate mesh: 8.30 seconds.
Optimize mesh: 17.17 seconds.

If this seem ok to you I can open a PR or help you fix the bug with open3d the same way I managed it.
There was also an error with @spaces.GPU placement but this one is really easy to fix.

Lastly, when you use .success chaining with Gradio, you can't get ZeroGPU logged-in (thus neither Pro) quotas.
It might be better to either have segment and run_3d in the same @spaces.GPU function
Or to make an explicit Gradio button for each function

Side note regarding CUDA toolkit install

All those libs are already installed by PyTorch (you of course need to set the right LD_LIBRARY_PATH):

/usr/local/lib/python3.10/site-packages/nvidia$ find . -name '*.so*'
./cublas/lib/libcublas.so.12
./cublas/lib/libcublasLt.so.12
./cublas/lib/libnvblas.so.12
./cuda_cupti/lib/libcheckpoint.so
./cuda_cupti/lib/libcupti.so.12
./cuda_cupti/lib/libnvperf_host.so
./cuda_cupti/lib/libnvperf_target.so
./cuda_cupti/lib/libpcsamplingutil.so
./cuda_nvrtc/lib/libnvrtc-builtins.so.12.1
./cuda_nvrtc/lib/libnvrtc.so.12
./cuda_runtime/lib/libcudart.so.12
./cudnn/lib/libcudnn.so.9
./cudnn/lib/libcudnn_adv.so.9
./cudnn/lib/libcudnn_cnn.so.9
./cudnn/lib/libcudnn_engines_precompiled.so.9
./cudnn/lib/libcudnn_engines_runtime_compiled.so.9
./cudnn/lib/libcudnn_graph.so.9
./cudnn/lib/libcudnn_heuristic.so.9
./cudnn/lib/libcudnn_ops.so.9
./cufft/lib/libcufft.so.11
./cufft/lib/libcufftw.so.11
./curand/lib/libcurand.so.10
./cusolver/lib/libcusolver.so.11
./cusolver/lib/libcusolverMg.so.11
./cusparse/lib/libcusparse.so.12
./nccl/lib/libnccl.so.2
./nvjitlink/lib/libnvJitLink.so.12
./nvtx/lib/libnvToolsExt.so.1
ARC Lab, Tencent PCG org

@cbensimon Wow we thank you soooo much for the help! We have been troubled by this problem for a long time. We welcome you to open a PR! The timing seems OK, although the pose estimation time is longer than what we measured in our local server (usually ~7 seconds on an A100 GPU).

Thank you for your valuable advice on merging the segmentation and run_3d in a single function! We'd also like to know how to set the right LD_LIBRARY_PATH for the space.

I'm working on a PR (or at least more precise directions). I'll keep you updated

PR is opened: https://huggingface.co/spaces/TencentARC/FreeSplatter/discussions/6

It might mess up internal mechanisms of FreeSplatter so feel free to cherry-pick / modify as you like.

It looks like we still have a small performance issue because the app takes a lot of time overall (more than what is printed in run_freesplatter_object).
So maybe that some CUDA code actually falls back to CPU but I have not been able to figure this out for sure.

I can confirm that open3d seems to work well with the trick on ZeroGPU: https://huggingface.co/spaces/cbensimon/open3d-zerogpu-test/blob/main/app.py

ARC Lab, Tencent PCG org
โ€ข
edited 2 days ago

@cbensimon Thank you so much for your contribution! The PR has been successfully merged into the main branch, and we're excited to have it working on ZeroGPU!

I have another small question, I've noticed a layout issue when upgrading to Gradio 5.x - all UI components are squeezed into half of the available width and not clickable (but it works fine in our local environment). While the demo functions perfectly with Gradio 4.44.1, I'd prefer to use 5.x for its improved UI aesthetics. Do you have any suggestions for resolving this layout issue?

image.png

Sign up or log in to comment