tensorflow-gpu recognizes XLA-CPU instead of GPU-CodePudding

I am trying to install keras-gpu on PC with Tesla V100 and Windows Server 2019. I installed some version (2.4.3) and found that my GPU is not working. I need to install any 2.x.x version of keras with GPU support.

I have installed CUDA 10.1 cudnn 8.0.5 and after many attempts also tried 11.2 version with cudnn 8.1.1 (Also tried 11.5). And started searching version of tensorflow which can find my GPU.

for 10.1:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243

I am using this code to check all:

import tensorflow
print(tensorflow.__version__)
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

My output:

2021-11-06 10:39:16.326880: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2.3.0
2021-11-06 10:39:21.177512: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-06 10:39:21.208333: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x25d395509b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-06 10:39:21.217997: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-11-06 10:39:21.261861: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-11-06 10:39:21.677227: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-06 10:39:21.692028: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: windows-freqgpu
2021-11-06 10:39:21.700398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: windows-freqgpu
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 881354854201867138
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5868137251793075209
physical_device_desc: "device: XLA_CPU device"
]

Tesla V100 here is XLA_CPU. how to fix this?

CodePudding user response：

You could try installing tensorflow-gpu 2.2.x or 2.3.x which are compatible with CUDA 10.1, as can be checked in the tested build configurations below:

https://www.tensorflow.org/install/source#gpu

If you look at tested build configurations, you will see that tensorflow 2.4.0 is tested for CUDA 11.0. Looking at software requirements on tensorflow GPU support page (https://www.tensorflow.org/install/gpu#software_requirements) you can see that CUDA 11.2 seems to be recommended only for Tensorflow >= 2.5.0.

It is unlikely that your GPU is recognized as a 'XLA_CPU' device. Here 'XLA' stands for 'accelerated linear algebra' (https://www.tensorflow.org/xla). It's a domain specific compiler that can be used both for CPUs and GPUs. For more details you could take a look at this what is XLA_GPU and XLA_CPU for tensorflow. It is more likely that your GPU is simply not detected, as evidenced by this line in your output.

2021-11-06 10:39:21.677227: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

CodePudding user response：

As was mentioned by @talonmies it was driver related problem. To be more precise Tesla driver related problem. I had updated driver, but Tesla requires specific versions of driver for different CUDA versions.

Also for common GPUs CUDA brings correct driver itself.

Correct installation for Tesla v100/Windows Server 2019/CUDA 10.1:

Install CUDA (10.1 in my case)
install driver which fits this CUDA version (427.60)
Install cuDNN (7.6.5)