Home > Software design >  dispatch_apply leaves one thread "hanging"
dispatch_apply leaves one thread "hanging"

Time:09-16

I am experimenting with multithreading following Apples Concurrency Programming Guide. The multithreaded function (dispatch_apply) replacing the for-loop seems straightforward and works fine with a simple printf statement. However, if the block calls a more cpu-intensive calculation, the program never ends or executes past dispatch_apply, and one thread (main thread?) seems stuck at 100%.

#import <Foundation/Foundation.h>

#define THREAD_COUNT 16

unsigned long long longest = 0;
unsigned long long highest = 0;

void three_n_plus_one(unsigned long step);

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
        dispatch_apply(THREAD_COUNT, queue, ^(size_t i) {
            three_n_plus_one(i);
        });
    }
    return 0;
}

void three_n_plus_one(unsigned long step) {
    
    unsigned long long start = step;
    unsigned long long end = 1000000;
    unsigned long long identifier = 0;

    while (start <= end) {

        unsigned long long current = start;
        unsigned long long sequence = 1;

        while (current != 1) {
        
            sequence  = 1;
            if(current % 2 == 0)
                current = current / 2;
            else {
                current = (current * 3)   1;
            }

            if (current > highest) highest = current;
        }

        if (sequence > longest) {
            longest = sequence;
            identifier = start;
            printf("thread %lu, number %llu with %llu steps to one\n", step, identifier, longest);
        }

        start  = (unsigned long long)THREAD_COUNT;
    }
}

Still, the loop seems to be finished. From what I understand, this should be fairly straight forward, still I'm left clueless as to what I'm doing wrong here.

CodePudding user response:

What you're calling step is the index of the loop. It goes from 0 to THREAD_COUNT-1 in your code. Since you assign start to be step, that means your first iteration tries to compute the Colatz sequence starting at zero. That computes 0/2 == 0 and so is an infinite loop.

What you meant to write is:

unsigned long long start = step   1;

Calling your block size "THREAD_COUNT" is misleading. The question is not how many threads are created (no threads may be created; that's up to the system). The question is how many chunks to divide the work into.

Note that reading and writing to longest and highest on multiple threads without synchronization is undefined behavior, so this code may do surprising things (particularly when optimized). Don't assume it's limited to getting the wrong values in longest and highest. The optimizer is allowed to assume no other thread touches those values while it runs, and can rearrange code dramatically based on that. But that's not the cause of this particular issue.

CodePudding user response:

As TSAN

Then, when you run, it will warn you about the data races:

Data race warnings

You can fix these races by synchronizing your access to these variables. A lock is one simple mechanism. I would also avoid doing a synchronization within the inner while loop, if you can. In this case, you can even remove it from the outer while loop, too. In this case, I might suggest a local variables to keep track of the current “longest” sequence, the “highest” value, and the identifier for that highest value, and then only compare to and update the shared variable when you are done, outside of both loops.

E.g. perhaps:

- (void)three_n_plus_one:(unsigned long) step {
    unsigned long long start = step   1;
    unsigned long long end = 1000000;
    unsigned long long tempHighest = start;
    unsigned long long tempLongest = 1;
    unsigned long long tempIdentifier = start;

    while (start <= end) {
        unsigned long long current = start;
        unsigned long long sequence = 1;

        while (current != 1) {
            sequence  = 1;

            if (current % 2 == 0)
                current = current / 2;
            else {
                current = (current * 3)   1;
            }

            if (current > tempHighest) tempHighest = current;
        }

        if (sequence > tempLongest) {
            tempLongest = sequence;
            tempIdentifier = start;
        }

        start  = (unsigned long long)THREAD_COUNT;
    }

    [lock lock];   // synchronize updating of shared memory

    if (tempHighest > highest) {
        highest = tempHighest;
    }

    if (tempLongest > longest) {
        longest = tempLongest;
        identifier = tempIdentifier;
    }

    [lock unlock];
}

I used a NSLock, but use whatever synchronization mechanism you want. But the idea is (a) to make sure to synchronize all interaction with shared memory and; (b) to reduce the necessary number of synchronizations to a bare minimum. (In this case, a naïve synchronization approach was 200× slower than the above, which minimizes the number of synchronizations to the bare minimum.)

When you are done fixing the data races, you can then turn TSAN off.

  • Related