On attempting to use nvprof
to profile my program, I receive the following output with no other information:
<program output>
======== Warning: No profile data collected.
The code used follows this classic first cuda program. I have had nvprof work on my system before, however I recently had to re-install cuda.
I have attempted to follow the suggestions in this post which suggested to include cudaDeviceReset()
and cudaProfilerStart/Stop()
and to use some extra profiling flags nvprof --unified-memory-profiling off
without luck.
This nvidia developer forum post seems to run into a similar error, however the suggestions here seemed to indicate needing to use a different compiler than nvcc
due to some OpenACC library I do not use.
System Specifications
- System: Windows 11 x64 using WSL2
- CPU: i7 8750H
- GPU: gtx 1050 ti
- CUDA Version: 11.8
For completeness, I have included my program code, though I imagine it has more to due with my system:
Compiling:
nvcc add.cu -o add_cuda
Profiling:
nvprof ./add_cuda
add.cu:
#include <iostream>
#include <math.h>
#include <cuda_profiler_api.h>
// function to add the elements of two arrays
__global__
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i )
y[i] = x[i] y[i];
}
int main(void)
{
int N = 1<<20; // 1M elements
cudaProfilerStart();
// Allocate Unified Memory -- accessible from CPU or GPU
float *x, *y;
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
// initialize x and y arrays on the host
for (int i = 0; i < N; i ) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i )
maxError = fmax(maxError, fabs(y[i]-3.0f));
std::cout << "Max error: " << maxError << std::endl;
// Free memory
cudaFree(x);
cudaFree(y);
cudaDeviceReset();
cudaProfilerStop();
return 0;
}
How can I resolve this to get actual profiling information using nvprof
?
CodePudding user response:
As per the documentation, there is currently no profiling support in CUDA for WSL. This is why there is no profiling data collected when you are using nvprof.