I have problem understanding cuda and docker ecosystem.
On a host (ubuntu 22.04) server I want to spawn multiple Machiene Learning Jupyter notebooks.
Is it enough if I install in the host ubuntu ONLY Nvidia drivers like this:
sudo apt-get install linux-headers-$(uname -r)
DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
echo $DISTRIBUTION
wget https://developer.download.nvidia.com/compute/cuda/repos/$DISTRIBUTION/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-drivers
sudo reboot
#After reboot verify if the CUDA driver is installed:
nvidia-smi
and then install cuda
in containers like this:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
I think the purest way is to install only required packages on host systems and enrich the containers with all the necessary packages. That is why I wonder if this approach is reasonable.
- Do I understand correctly that the container will then use the drivers from host system?
- Is installing
cuda
enough to be installed in the conatiners, or I shall installcuda-toolkit
as it contains more additional packages?
CodePudding user response:
Do I understand correctly that the container will then use the drivers from host system?
Yes, the container will use the drivers from the host system. If you are building your own container, do not install the drivers in the container.
Is installing cuda enough to be installed in the conatiners, or I shall install cuda-toolkit as it contains more additional packages?
You might not ever want to install cuda
in this scenario. You can install cuda
(which will also install the drivers mentioned in your question 1) in the host machine. That is acceptable. In the container, if you are building it yourself, you don't want to install cuda
, at most the cuda-toolkit
.
Can I use only nvidia-drivers for host machine of docker based system?
There are generally 3 items needed in the host machine, beyond the linux OS install, to make it ready for CUDA-enabled container usage:
- The GPU drivers
- A recent docker version
- The NVIDIA container toolkit (see above link for install instructions)
It is not necessary to install the CUDA toolkit (i.e. the items beyond the GPU driver install) in the host machine. These will usually be installed in the container if they are needed in the container.