Apply for community grant: Academic project (gpu)

#1
by kxic - opened

This is a demo for CVPR'24 Oral paper EscherNet (https://kxhit.github.io/EscherNet), allowing user to input a few of images and get a 360 novel view synthesis (3D generation) of the object.

This work is based on the HF diffuser library and all the code/checkpoints are open-sourced. This demo needs GPU to deploy and thank you so much for HF support!

Hi @kxic , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

Thanks @hysts for providing us with the ZeroGPU!

We're having some challenges with running the demo online. Pulling and running the docker container locally works fine, however we're having troubles with running on HF because of gradio's queue breaking CUDA. Do we need to do anything special on our end to handle gradio's queueing?

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1173, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 690, in wrapper
    response = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 171, in gradio_handler
    res = worker.res_queue.get()
  File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
  File "/usr/local/lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 120, in rebuild_cuda_tensor
    torch.cuda._lazy_init()
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch.py", line 181, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

@marwanPtr Thanks for testing ZeroGPU. Has this issue been already resolved? I don't see the error in the log. (Looks like there's another error, though). I think you've resolved it, but in case it's still an issue, the error is not about queueing. The error is raised when trying to use CUDA outside of functions decorated with @spaces.GPU. On ZeroGPU, CUDA is available only in functions with @spaces.GPU.

@hysts Cheers for your feedback, that was exactly our issue! Turns out we were trying to put objects on the GPU into gr.State.

@hysts Hi, thanks for the help and the generous GPU support. I'm wondering if there could be a free A100 support in the coming week during CVPR. We would like to do a live demo during the oral/poster session. Thank you for considering our GPU grant application.

Hi @kxic Unfortunately, normal A100 is not available for grants. I just want to check if my understanding is correct, but do you want to use a dedicated hardware for your live demo because ZeroGPU has GPU quota issue? Anyway, I wonder if your Space can run on L4 or A10G. We can still assign them as a grant.

@hysts Yes, exactly! We want to have a dedicated hardware to avoid the queue during the CVPR. I showed this demo to @sayakpaul a few days ago in London, and he said it could be further sped up by a free A100 and torch compiling or something? Would love to discuss the possibility. Thank you!

@kxic I see. I kind of feel that the free A100 @sayakpaul mentioned was ZeroGPU, but as the Space is already on Zero and torch compile doesn't work on Zero, so I'm a bit confused. I guess we can wait for his reply on that point.
As for the dedicated hardware for your live demo, I think you can duplicate this Space privately and we can assign a dedicated hardware to it so that people can use this Space on ZeroGPU. In the case of ZeroGPU, multiple backend GPUs can be used for a single Space to minimize the waiting time for users, but in the case of dedicated hardware, there's only one GPU, so users have to wait in the queue to run the demo. So, I think you should keep the Space for the live demo private so you can run it without waiting in the queue.

Sign up or log in to comment