I am coming across a strange issue when using TensorFlow (2.9.1). After defining a distributed training strategy, my GPU memory appears to fill.
Steps to reproduce are simple:
import tensorflow as tf
strat = tf.distribute.MirroredStrategy()
After the first line (importing TensorFlow), nvidia-smi
outputs:
Fri Jun 10 03:01:47 2022
-----------------------------------------------------------------------------
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=============================== ====================== ======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 25C P8 9W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 20C P8 7W / 250W | 0MiB / 24449MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
-----------------------------------------------------------------------------
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
-----------------------------------------------------------------------------
After the second line of code, nvidia-smi
outputs:
Fri Jun 10 03:02:43 2022
-----------------------------------------------------------------------------
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=============================== ====================== ======================|
| 0 Quadro P6000 Off | 00000000:04:00.0 Off | Off |
| 26% 29C P0 59W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
| 1 Quadro P6000 Off | 00000000:06:00.0 Off | Off |
| 26% 25C P0 58W / 250W | 23951MiB / 24449MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
-----------------------------------------------------------------------------
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1833720 C python 23949MiB |
| 1 N/A N/A 1833720 C python 23949MiB |
-----------------------------------------------------------------------------
The GPU memory is almost entirely full? There is also some terminal output:
2022-06-10 03:02:37.442336: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-10 03:02:39.136390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 23678 MB memory: -> device: 0, name: Quadro P6000, pci bus id: 0000:04:00.0, compute capability: 6.1
2022-06-10 03:02:39.139204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 23678 MB memory: -> device: 1, name: Quadro P6000, pci bus id: 0000:06:00.0, compute capability: 6.1
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1')
Any ideas on why this is occurring would be helpful! Additional details about my configuration:
- Python 3.10.4 [GCC 7.5.0] on linux
- tensorflow 2.9.1
- cuda/11.2.2 cudnn/v8.2.1
CodePudding user response:
By default, Tensorflow will map almost all of your GPU memory: official guide. This is for performance reasons: by allocating the GPU memory, it reduces latency that memory growth would typically cause.
You can try using tf.config.experimental.set_memory_growth
to prevent it from immediately filling up all its memory. There are also some good explanations on this StackOverflow post.