I am trying to install apex following the steps:
git clone https://github.com/NVIDIA/apex
cd apex
pip3 install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./
cd ..
When I start the installation, I get the following error:
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
from /3tb/share/anaconda3/envs/ak_env/bin
running install
/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
running build_ext
/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
building 'scaled_upper_triang_masked_softmax_cuda' extension
gcc -pthread -B /3tb/share/anaconda3/envs/ak_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem /3tb/share/anaconda3/envs/ak_env/include -fPIC -O2 -isystem /3tb/share/anaconda3/envs/ak_env/include -fPIC -I ~/seq2seq/apex/csrc -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/TH -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/THC -I/3tb/share/anaconda3/envs/ak_env/include -I/3tb/share/anaconda3/envs/ak_env/include/python3.10 -c csrc/megatron/scaled_upper_triang_masked_softmax.cpp -o build/temp.linux-x86_64-cpython-310/csrc/megatron/scaled_upper_triang_masked_softmax.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c 14
/3tb/share/anaconda3/envs/ak_env/bin/nvcc -I ~/seq2seq/apex/csrc -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/TH -I/3tb/share/anaconda3/envs/ak_env/lib/python3.10/site-packages/torch/include/THC -I/3tb/share/anaconda3/envs/ak_env/include -I/3tb/share/anaconda3/envs/ak_env/include/python3.10 -c csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu -o build/temp.linux-x86_64-cpython-310/csrc/megatron/scaled_upper_triang_masked_softmax_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 -std=c 14
csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
21 | #include <cuda_profiler_api.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
csrc/megatron/scaled_upper_triang_masked_softmax_cuda.cu:21:10: fatal error: cuda_profiler_api.h: No such file or directory
21 | #include <cuda_profiler_api.h>
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/3tb/share/anaconda3/envs/ak_env/bin/nvcc' failed with exit code 255
error: subprocess-exited-with-error
× Running setup.py install for apex did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /3tb/share/anaconda3/envs/ak_env/bin/python -u -c '
Here is the output of nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Some solutions that I find suggest doing the following:
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"
However, /usr/local/cuda-11.7
is not exists in my system.
How can I solve this issue.
-----------------------------------------------------------------------------
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|------------------------------- ---------------------- ----------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=============================== ====================== ======================|
| 0 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 0% 46C P0 37W / 180W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
------------------------------- ---------------------- ----------------------
CodePudding user response:
I was able to solve this issue by manually installing cuda-11.7 toolkit even though I have cuda-11.7 installed using conda
https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux
After installing it, I followed these instructions
Please make sure that
- PATH includes /usr/local/cuda-11.7/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.7/lib64, or, add /usr/local/cuda-11.7/lib64 to /etc/ld.so.conf and run ldconfig as root
By using the following commands before compiling apex
export PATH="/usr/local/cuda-11.7/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH"
- Note: I had to use the root user for the compiling due to issues with installing the toolkit, which you may not need to do. After that I changed the ownership to the regular user. It is not recommanded to use the root.