How to reduce batch size in order to solve CUDA out of memory error?
Hello. I'm running this model on a cloud GPU on Google Cloud. I'm currently using a NVIDIA T4 GPU. I thought I had enough memory in the GPU to run this model (16 GB), but whenever I run the server.py script to run the text-generation-webui, I get this message "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 14.62 GiB total capacity; 13.85 GiB already allocated; 169.38 MiB free; 13.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF". I assume I don't need to free much memory, so maybe reducing the batch size could work. Does anyone know how I can do this?
Yes, changing the Batch size could help to reduce Vram usage. You can try reducing the batch size by locating the line of code in your script that sets the batch size and decreasing its value. If you’re not sure where to find this line of code, you can try searching for “batch_size” in your script. I do not know the structure of the code that you are using, so I can't give you any precise Instructions on changing that parameter.
Yes, changing the Batch size could help to reduce Vram usage. You can try reducing the batch size by locating the line of code in your script that sets the batch size and decreasing its value. If you’re not sure where to find this line of code, you can try searching for “batch_size” in your script. I do not know the structure of the code that you are using, so I can't give you any precise Instructions on changing that parameter.
Would I find the script for changing batch size in the model itself, or is it just the server.py script
You would find this Parameter if it exists in the script you use to run the model.