Home > database >  cuda out of memory during training
cuda out of memory during training

Time:09-11

I am using Pytorch to do a cat-dog classification. I keep getting a Cuda out of memory problem during training and validation. If I only run the training, I don't have this issue. But when I add a valiation process, I get this oom issue. I cannot see what is happenning.

I have tried: change batchsize to 1; torch.cuda.empty_cache(); and tensor.cpu() for all variables.

RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CodePudding user response:

Could you please update your question to show your code? Also check that you do use with torch.no_grad() for validation, because otherwise it might compute the gradients and thus consume more memory.

  • Related