Home > front end >  CUDA OOM - But the numbers don't add upp?
CUDA OOM - But the numbers don't add upp?

Time:11-24

I am trying to train a model using PyTorch. When beginning model training I get the following error message:

RuntimeError: CUDA out of memory. Tried to allocate 5.37 GiB (GPU 0; 7.79 GiB total capacity; 742.54 MiB already allocated; 5.13 GiB free; 792.00 MiB reserved in total by PyTorch)

I am wondering why this error is occurring. From the way I see it, I have 7.79 GiB total capacity. The numbers it is stating (742 MiB 5.13 GiB 792 MiB) do not add up to be greater than 7.79 GiB. When I check nvidia-smi I see these processes running

|    0   N/A  N/A      1047      G   /usr/lib/xorg/Xorg                168MiB |
|    0   N/A  N/A      5521      G   /usr/lib/xorg/Xorg                363MiB |
|    0   N/A  N/A      5637      G   /usr/bin/gnome-shell              161MiB |

I realize that summing all of these numbers might cut it close (168 363 161 742 792 5130 = 7356 MiB) but this is still less than the stated capacity of my GPU.

CodePudding user response:

This is more of a comment, but worth pointing out.

The reason in general is indeed what talonmies commented, but you are summing up the numbers incorrectly. Let's see what happens when tensors are moved to GPU (I tried this on my PC with RTX2060 with 5.8G usable GPU memory in total):

Let's run the following python commands interactively:

Python 3.8.10 (default, Sep 28 2021, 16:10:42) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.zeros(1).cuda()
>>> b = torch.zeros(500000000).cuda()
>>> c = torch.zeros(500000000).cuda()
>>> d = torch.zeros(500000000).cuda()

The following are the outputs of watch -n.1 nvidia-smi:

Right after torch import:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |

Right after the creation of a:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           1251MiB |

As you can see, you need 1251MB to get pytorch to start using CUDA, even if you only need a single float.

Right after the creation of b:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           3159MiB |

b needs 500000000*4 bytes = 1907MB, this is the same as the increment in memory used by the python process.

Right after the creation of c:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           5067MiB |

No surprise here.

Right after the creation of d:

|    0   N/A  N/A      1121      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     14701      C   python                           5067MiB |

No further memory allocation, and the OOM error is thrown:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB (GPU 0; 5.80 GiB total capacity; 3.73 GiB already allocated; 858.81 MiB free; 3.73 GiB reserved in total by PyTorch)

Obviously:

  • The "already allocated" part is included in the "reserved in total by PyTorch" part. You can't sum them up, otherwise the sum exceeds the total available memory.
  • The minimum memory required to get pytorch running on GPU (1251M) is not included in the "reserved in total" part.

So in your case, the sum should consist of:

  • 792MB (reserved in total)
  • 1251MB (minimum to get pytorch running on GPU, assuming this is the same for both of us)
  • 5.13GB (free)
  • 168 363 161=692MB (other processes)

They sum up to approximately 7988MB=7.80GB, which is exactly you total GPU memory.

  • Related