Text2Text Generation
Transformers
PyTorch
5 languages
t5
flan-ul2
Inference Endpoints
text-generation-inference

Error in Fine Tunning Flan-UL2 model

#26
by Ahatsham - opened

Hi, I am trying to fine-tune the Flan-UL2 model on my dataset. I have two 32-GB GPU's but whenever I am tyring to run my model. It shows,

"RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 31.74 GiB total capacity; 30.74 GiB already allocated; 136.88 MiB free; 30.74 GiB reserved in total by PyTorch)."

Can anybody help me with that?

Flan-UL2 requires min GPU size of 50GB (that to maybe for inference). For batched training you need clusters. And maybe 64gb might suffice if you load model into both GPUs for inference I think you are trying to load whole model on device 0

This comment has been hidden

Flan-UL2 requires min GPU size of 50GB (that to maybe for inference). For batched training you need clusters. And maybe 64gb might suffice if you load model into both GPUs for inference I think you are trying to load whole model on device 0

Yes, you are right, I am new in this area, can you help me with how to use multiple GPU's?

Flan-UL2 requires min GPU size of 50GB (that to maybe for inference). For batched training you need clusters. And maybe 64gb might suffice if you load model into both GPUs for inference I think you are trying to load whole model on device 0

Yes, you are right, I am new in this area, can you help me with how to use multiple GPU's?

Adding this parameter device_map="auto" while loading the pipeline should do the work https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.pipeline.device_map

Sign up or log in to comment