Unable to run on free tier Colab
I tried to follow the instructions for free-tier T4 Colab with 8 bit quantization but I get this error
OutOfMemoryError: CUDA out of memory. Tried to allocate 35.31 GiB. GPU 0 has a total capacity of 14.75 GiB of which 11.51 GiB is free. Process 22076 has 3.24 GiB memory in use. Of the allocated memory 2.69 GiB is allocated by PyTorch, and 429.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Please suggest.
Also in this part of the suggested code I replaced 5b with 2b as I was trying for 2b model.
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="text_encoder", torch_dtype=torch.bfloat16)
quantize_(text_encoder, quantization())
transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16)
quantize_(transformer, quantization())
did you Using with diffusers with 0.30.1
Yes I used diffusers 0.30.1. I am sharing link to my Colab. Please suggest changes.
https://colab.research.google.com/drive/1GMt-DQ_LHlQGAEK9LB8BpEPfUEszBTY4?usp=sharing
Even I tried on the free tier T4 env. I got the error CUDA ran out of memory
pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()
try this, I am running just in 2.5GB
Getting this error
TypeError Traceback (most recent call last)
in <cell line: 2>()
1 pipe.enable_model_cpu_offload()
----> 2 pipe.enable_sequential_cpu_offload()
3 pipe.vae.enable_slicing()
4 pipe.vae.enable_tiling()
12 frames
/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py in set_module_tensor_to_device(module, tensor_name, device, value, dtype, fp16_statistics, tied_params_map)
438 new_value = torch.nn.Parameter(new_value, requires_grad=old_value.requires_grad).to(device)
439 else:
--> 440 new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
441
442 module._parameters[tensor_name] = new_value
TypeError: AffineQuantizedTensor.new() got an unexpected keyword argument 'requires_grad'
this should update to diffusers 0.30.1
I am using diffusers 0.30.1. Question if you only need 2.5 GB why is the code trying to allocate 35.31 GB. I am guessing you are running on A100 GPU with 48 GB VRAM so you are able to run your code. Can you please try to run on T4 with 15 GB VRAM. You can just use the Colab link l shared. Thank you for your support.
Your code has a few critical issues:
1. You didn’t enable full optimization, which allows for proper inference without needing to rely on INT8 quantization.
2. You didn’t use float16; note that bf16 is only supported on GPUs from the 30 series and above.
I’ve already made these corrections for you. Could you try running the code again to see if it works correctly? And this is 5B not 2B working in your T4
check here for more detail
Wonderful. It works now. Thank you so much for your support. Although for some reason the output file was blank. Will take a closer look later.