Apply for community grant: Academic project (gpu)

#1
by ShoufaChen - opened
FoundationVision org

We introduce LlamaGen, a new family of image generation models that apply original next-token prediction paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.

Hi @ShoufaChen , we've assigned ZeroGPU to this Space. Please check the compatibility and usage sections of this page so your Space can run on ZeroGPU.

FoundationVision org

Hi @hysts , thank you very much for your kind donation.

Our Gradio app encountered a CUDA-related error on ZeroGPU. Could you tell me the difference between our current GPU and ZeroGPU? Our app requires CUDA >= 12.1.

We transferred to the original A100 GPU as a workaround.

Thanks for checking. Hmm, not sure what difference caused the error. What error did you get?

FoundationVision org

I am sorry that I didn't copy the full log. Could I fork this repo to a ZeroGPU one? Since this demo is very active right now, it would significantly affect users if I try to debug this problem on this one.

@ShoufaChen Ah, sorry, I accidentally changed the hardware to L4. (I thought I changed the hardware of my duplicate of your Space, but apparently it wasn't the case). Could you change the hardware back to A100?

FoundationVision org

no worry, now it is back to A100 one.

@ShoufaChen Thanks!

Regarding ZeroGPU, looks like the following error is raised on ZeroGPU.

Traceback (most recent call last):
  File "/home/user/app/app.py", line 116, in <module>
    vq_model, llm, image_size = load_model(args)
  File "/home/user/app/app.py", line 46, in load_model
    llm = LLM(
  File "/home/user/app/serve/llm.py", line 124, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/home/user/app/serve/llm_engine.py", line 284, in from_engine_args
    engine = cls(
  File "/home/user/app/serve/llm_engine.py", line 152, in __init__
    self.model_executor = executor_class(
  File "/home/user/app/serve/gpu_executor.py", line 42, in __init__
    self._init_executor()
  File "/home/user/app/serve/gpu_executor.py", line 51, in _init_executor
    self._init_non_spec_worker()
  File "/home/user/app/serve/gpu_executor.py", line 80, in _init_non_spec_worker
    self.driver_worker.init_device()
  File "/home/user/app/serve/worker.py", line 102, in init_device
    torch.cuda.set_device(self.device)
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 408, in set_device
    torch._C._cuda_setDevice(device)
  File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch.py", line 181, in _cuda_init_raise
    raise RuntimeError(
RuntimeError: CUDA must not be initialized in the main process on Spaces with Stateless GPU environment.
You can look at this Stacktrace to find out which part of your code triggered a CUDA init

On ZeroGPU, backend GPUs are shared across multiple ZeroGPU Spaces and CUDA is only available in the function decorated with @spaces.GPU.
I remember someone mentioned that the vllm was not compatible with ZeroGPU, so I guess that's the reason.

Would it be possible for you to not use vllm for your demo or would it be possible to make it runnable on L4?
The hardware with the largest VRAM we can grant is ZeroGPU (A100 with 40GB VRAM), and apparently OOM occurs when running on L4.

FoundationVision org

@hysts Thank you very much for your kind help.

Yes, our demo can work without vllm. However, it would be 4x slower without vllm. It needs at least 40 GB VRAM.

Thank you all the way ❤️❤️❤️

@ShoufaChen Thanks, I see. A normal A100 is not available for grants, so I think it would be nice if you could migrate your Space to ZeroGPU without using vllm, then. (I wonder if there's a workaround to use vllm on ZeroGPU.)
I'll add you to the ZeroGPU explorers org so you can test if ZeroGPU works for your Space by duplicating your Space and assigning ZeroGPU to it yourself. Once you made your Space runnable on ZeroGPU, you can update the code of this Space and delete the duplicate Space you used for testing.
Also, I'll remove the L4 grant from this Space as it only has 24 GB VRAM and useless for this Space.

@ShoufaChen Ah, sorry again. Apparently, the hardware was switched back to cpu-basic when we remove the grant. Would you change the hardware back to A100 again?

FoundationVision org

@ShoufaChen Ah, sorry again. Apparently, the hardware was switched back to cpu-basic when we remove the grant. Would you change the hardware back to A100 again?

done. no worry.

Sign up or log in to comment