Memory Error While Fine-tuning AYA on 8 H100 GPUs
#23
by
ArmanAsq
- opened
Hello,
I am currently trying to fine-tune an AYA model on 8 H100 GPUs, but I'm encountering a memory error. My system has 640 GB of GPU RAM, which I assumed would be sufficient for this task. I'm not using PEFT or LoRA, and my batch size is set to 1.
I'm wondering if anyone has encountered a similar issue and could provide some guidance. How many GPUs are typically recommended for this task? Any help would be greatly appreciated.
Thanks in advance!
Hey @ArmanAsq
I think I answered your question on our Discord so closing this one for now :)
shivi
changed discussion status to
closed