I am using Pytorch to do a cat-dog classification. I keep getting a Cuda out of memory problem during training and validation. If I only run the training, I don't have this issue. But when I add a valiation process, I get this oom issue. I cannot see what is happenning.
I have tried: change batchsize to 1; torch.cuda.empty_cache(); and tensor.cpu() for all variables.
RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 8.00 GiB total capacity; 7.21 GiB already allocated; 0 bytes free; 7.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CodePudding user response:
Could you please update your question to show your code?
Also check that you do use with torch.no_grad()
for validation, because otherwise it might compute the gradients and thus consume more memory.