Home > Software engineering >  CUDA : thread divergence optimization
CUDA : thread divergence optimization

Time:12-07

I am trying to understand thread divergence. I have a few differents questions.

  1. About thread divergence, is there any performance benefit to disable a thread if he don't need to do the computation? For example:
__global__ void kernel_1()
{
    int i = f();

    // We know if this condition is false, i is less than g()
    if(threadId.x < 5)
    {
        i = min(g(), i);
    }
}

__global__ void kernel_2()
{
    int i = f();
    i = min(g(), i);
}

Which kernel is the better?

  1. Does CUDA defines "thread divergence" only considerating code source path? For example:
__global__ void kernel_3()
{
    if(threadIdx.x < 5)
    {
        int i = g();
        printf("hello\n");
    }
    else
    {
        int i = g();
        printf("hello\n");
    }
}

In this code, both branchs have exactly the same code. So does the warp diverges or not?

CodePudding user response:

Which kernel is the better?

My expectation is the first kernel is better, but there may be little or no measurable difference in performance.

Since you haven't given the definition of g(), it's possible that g() either does something that has limited throughput or g() generates memory traffic. Either one of those would be better to avoid doing. An example of a "limited throughput" operation would be any operation for which the throughput table in the programming guide lists a throughput of less than 32 ops/clock. If g() doesn't do any of these things, then its likely that there is no measurable difference between the two cases.

So does the warp diverges or not?

The warp diverges. You can verify this with CUDA binary utilities.

  • Related