Home > Back-end >  Thread cancellation before calling join() gives an error
Thread cancellation before calling join() gives an error

Time:12-13

IEEE Standard reads that

The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread.

In the following program a single thread is created. This thread executes the thread_task() routine. After the routine is done, the thread exits but, because its detachstate attribute is PTHREAD_CREATE_JOINABLE (by default), I would expect calling pthread_cancel() on this thread to be safe and not return any error. It's kinda lengthy because of extensive error checking

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int counter=0;

void free_buffer(void* buff)
{
    printf("freeing buffer\n");
    free(buff);
}

void* thread_task(void* arg)
{
    void* buffer = malloc(1000);
    pthread_cleanup_push(free_buffer, buffer);

    for(int i = 0; i < 100000; i  ) { // 'counter' is a global variable
        for(counter = 0; counter < 10000; counter  );
        pthread_testcancel();
    }

    pthread_cleanup_pop(1);
    printf("Thread exiting\n");
    return NULL;
}

int main()
{
    pthread_t tid;
    int errnum = pthread_create(&tid, NULL, thread_task, NULL);
    if(errnum != 0) {
        fprintf(stderr, "pthread_create(): %s\n", strerror(errnum));
        exit(EXIT_FAILURE);
    }    

    getchar();

    errnum = pthread_cancel(tid);
    if(errnum != 0) {
        fprintf(stderr, "pthread_cancel(): %s [%d]\n", strerror(errnum), errnum);
        exit(EXIT_FAILURE);
    } 

    void* ret;
    errnum = pthread_join(tid, &ret);
    if(errnum != 0) {
        fprintf(stderr, "pthread_join(): %s [%d]\n", strerror(errnum), errnum);
        exit(EXIT_FAILURE);
    } 

    if(ret == PTHREAD_CANCELED) {
        printf("Thread was canceled\n");
    }

    printf("counter = %d\n", counter);
}

This doesn't happen however. When I run the program the messages I see are:

// wait for the thread routine to finish...
freeing buffer
Thread exiting
// press any key
pthread_cancel(): No such process [3]

This seems to suggest that after the thread exits, its TID is no longer valid. Doesn't this go against the standard? What's going on here?

CodePudding user response:

The problem comes from the fact that if you are not fast enough, the thread finishes by itself (consuming all the loops) before you type RETURN on the keyboard. So, pthread_cancel() ends in error because you are trying to cancel a thread which is terminated. But the following pthread_join() succeeds to reap the thread. With strace, you get an idea on what happens:

$ strace -f ./pcancel
execve("./pcancel", ["./pcancel"], 0x7ffd11e1ad58 /* 28 vars */) = 0
brk(NULL)                               = 0x55cf92027000
[...]

#### CREATION OF THE THREAD ==> Linux task id: 10679

clone(child_stack=0x7fe663b19fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fe663b1a9d0, tls=0x7fe663b1a700, child_tidptr=0x7fe663b1a9d0) = 10679
strace: Process 10679 attached

[pid 10678] fstat(0,  <unfinished ...>
[pid 10679] set_robust_list(0x7fe663b1a9e0, 24 <unfinished ...>
[pid 10678] <... fstat resumed> {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0
[pid 10679] <... set_robust_list resumed> ) = 0

#### Main thread is waiting for a char on the keyboard (getchar() call)

[pid 10678] read(0,  <unfinished ...>

#### Meanwhile the thread continues its execution...

[pid 10679] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fe65b31a000
[pid 10679] munmap(0x7fe65b31a000, 13524992) = 0
[pid 10679] munmap(0x7fe660000000, 53583872) = 0
[pid 10679] mprotect(0x7fe65c000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 10679] fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0
[pid 10679] write(1, "freeing buffer\n", 15freeing buffer
) = 15
[pid 10679] write(1, "Thread exiting\n", 15Thread exiting
) = 15
[pid 10679] madvise(0x7fe66331a000, 8368128, MADV_DONTNEED) = 0

#### The thread finishes here...

[pid 10679] exit(0)                     = ?
[pid 10679]     exited with 0    

#### Main thread reads the char on the keyboard

<... read resumed> "\n", 1024)          = 1

#### The call to pthread_cancel() fails because the thread is already finished

write(2, "pthread_cancel(): No such proces"..., 38pthread_cancel(): No such process [3]
) = 38
exit_group(1)                           = ?
    exited with 1    

If you type RETURN twice very quickly after the launching your program, pthread_cancel() will have a chance to be called by the main thread before the secondary thread finishes:

$ ./pcancel [RETURN typed twice very quickly]

freeing buffer
Thread was canceled
counter = 10000

CodePudding user response:

I don't know about the IEEE standard, but IMO, the man pages "pthreads(7)," and "pthread_cancel(3)" are ambiguous.

The pthread_cancel man page only gives one possible error code, ESRCH, which supposedly means, "No thread with the ID thread could be found." But notice, it says, "No thread...could be found" It doesn't say, "No such ID exists."

The pthreads(7) man page guarantees that the ID of a non-detached thread remains valid and unique until that ID is join()ed, but it doesn't say anything about whether the thread itself continues to "exist" (in the sense that pthread_cancel() cares about) just because its ID continues to exist.

I ran the OP's code on a different platform, and pthread_cancel() did not return an error for me, even long after the thread had returned from the thread_task() function. IMO, there's cases to be made for both OP's build toolchain and mine to be "correct" in the sense of, "compliant with the man pages."


I would expect calling pthread_cancel() on this thread to be safe and not return any error.

What does "safe" mean? To me, pthread_cancel() would be "safe" if it was possible to create a guaranteed reliable program that uses it. If you had to assume that either behavior is possible, that complicates things, but I don't think it makes the task impossible. IMO the worst it does is limit what kind of information you can gain from reading the errors if your program bothers to log them.

  • Related