I'm trying to reproduce results of an older research paper and need tp run a singularity container with nvidia CUDA 9.0 and torch 1.2.0.
Locally I have Ubuntu 20.04 as VM where I run singularity build
. I follow the guide to installing older CUDA versions.
This is the recipe file
#header
Bootstrap: docker
From: nvidia/cuda:9.0-runtime-ubuntu16.04
#Sections
%files
/home/timaie/rkn_tcml/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
/home/timaie/rkn_tcml/RKN/*
%post
# necessary dependencies
pip install numpy scipy scikit-learn biopython pandas
dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
apt-get autoclean
apt-get autoremove
apt-get update
export CUDA_HOME="/usr/local/cuda-9.0"
export TORCH_EXTENSIONS_DIR="$PWD/tmp"
export PYTHONPATH=$PWD:$PYTHONPATH
%runscript
cd experiments
python train_scop.py --pooling max --embedding blosum62 --kmer-size 14 --alternating --sigma 0.4 --tfid 0
which runs fine and gets me an image.simg file. Then I try installing cuda through sudo singularity exec image.simg apt-get install cuda
producing the following error
0 upgraded, 823 newly installed, 0 to remove and 1 not upgraded.
Need to get 2661 MB of archives.
After this operation, 6822 MB of additional disk space will be used.
W: Not using locking for read only lock file /var/lib/dpkg/lock-frontend
W: Not using locking for read only lock file /var/lib/dpkg/lock
W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (30: Read-only file system)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (30: Read-only file system)
W: Not using locking for read only lock file /var/cache/apt/archives/lock
E: You don't have enough free space in /var/cache/apt/archives/.
I read about a similar issue in docker here but I don't know of something similar to docker system prune
for Singularity.
I also tried freeing space through apt autoremove
and apt autoclean
without any success.
There should be enough space left on disk as running df -H
gives
Filesystem Size Used Avail Use% Mounted on
udev 2,1G 0 2,1G 0% /dev
tmpfs 412M 1,4M 411M 1% /run
/dev/sda5 54G 19G 33G 36% /
tmpfs 2,1G 0 2,1G 0% /dev/shm
tmpfs 5,3M 4,1k 5,3M 1% /run/lock
tmpfs 2,1G 0 2,1G 0% /sys/fs/cgroup
/dev/loop0 132k 132k 0 100% /snap/bare/5
/dev/loop1 66M 66M 0 100% /snap/core20/1328
/dev/loop2 261M 261M 0 100% /snap/gnome-3-38-2004/99
/dev/loop3 66M 66M 0 100% /snap/core20/1405
/dev/loop4 69M 69M 0 100% /snap/gtk-common-themes/1519
/dev/loop5 46M 46M 0 100% /snap/snapd/15177
/dev/loop6 57M 57M 0 100% /snap/snap-store/558
/dev/loop7 46M 46M 0 100% /snap/snapd/14978
/dev/sda1 536M 4,1k 536M 1% /boot/efi
tmpfs 412M 25k 412M 1% /run/user/1000
Does anyone know if the problem resides on my local Ubuntu, or with the nvidia docker image?
Thanks for any clarification.
CodePudding user response:
As described in overview section of singularity build
documentation
build can produce containers in two different formats that can be specified as follows.
- compressed read-only Singularity Image File (SIF) format suitable for production (default)
- writable (ch)root directory called a sandbox for interactive development (
--sandbox
option)
Adding --sandbox
should make the system files writable which should resolve your issue.
Ideally, I'd suggest adding any apt-get install
commands to the %post
section in your recipe file.