I installed nvidia-docker
and to test my installation, I ran docker run --rm --gpus all nvidia/cuda:10.0-base nvidia-smi
. I get this
-----------------------------------------------------------------------------
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=============================== ====================== ======================|
| 0 Quadro T2000 wi... On | 00000000:01:00.0 On | N/A |
| N/A 46C P0 10W / N/A | 2294MiB / 3911MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
The driver version and CUDA version are exactly the same as what I get when I run nvidia-smi
outside the container in my regular terminal. My understanding of why the driver version is the same is that device drivers are hardware specific, and thus aren't installed inside the container, and the reason why nvidia-docker
exists is to allow software running inside the container to talk to the device drivers. Is this correct?
My main point of confusion is why the CUDA version is reported as 11.4 from inside the container. When I launch a bash terminal inside this container and look at the CUDA installation in /usr/local
, I only see version 10.0, so why is nvidia-smi
inside the container giving me CUDA version installed on my host system?
I believe these questions display a fundamental misunderstanding either of how nvidia-smi
works, or how nvidia-docker
works, so could someone point me towards resources that might help me resolve this misunderstanding?
CodePudding user response:
You can't have more than 1 GPU driver operational in this setting. Period. That driver is installed in the base machine. If you do something not recommended, like install it or attempt to install it in the container, it is still the one in the base machine that is in effect for the base machine as well as the container. Note that anything reported by nvidia-smi
pertains to the GPU driver only, and therefore is using the driver installed in the base machine, whether you run it inside or outside of the container. There may be detailed reporting differences like visible GPUs, but this doesn't impact versions reported.
The CUDA runtime version will be the one that is installed in the container. Period. It has no ability to inspect what is outside the container. If it happens to match what you see outside the container, then it is simply the case that you have the same configuration outside the container as well as inside.
Probably most of your confusion would be resolved with this answer and perhaps your question is a duplicate of that one.