WINAPI how to wait until all threads finish?-CodePudding

I want to simulate Java's behavior of waiting until all threads in the process finish before the main exits in C/Windows API.

The behavior I want to simulate is one of the sample code below(it spams [second thread] active threads: 2., and does not terminate even when main returns):

public final class Test1
{
    // | prints current thread count and queues the next iteration.
    static void step() {
        System.out.println (
            "[second thread] active threads: "   
            Thread.activeCount()  
            "."
        );   
        new Thread(() -> step()).start();
    }
    
    public static void main(String[] args) throws Exception
    {
        // | queue the first iteration.
        new Thread(() -> step()).start();      
    }
}

My initial idea was to completely take over the main of my program, and instead do all the work in another function, eg main2 and if it finishes early, I will wait until the rest of the threads finish.

My problem is that my main has no idea what other threads exist, and even if it did know what other threads existed, after they all finish, it is still possible that they have spawned more threads, that we are again not aware of.

My approach to tackle this looks something as follows:

main.c would contain the actual main, and the actual main logic would be moved out to main2(or something with a better name). main would potentially resort to using CreateToolhelp32Snapshot to discover threads that do not match its own GetThreadId and wait for them(potentially aggregating existing threads to avoid only fetching one existing thread at a time, to take advantage of WaitForMultipleObjects).

/**
 * @file main.c
 */
#include <Windows.h>

// | This function will can start threads without worrying about them
// |     ending as soon as it finishes.
extern int main2(int argc, char **argv);

// | NOT IMPLEMENTED: I have no idea if such a service exists, but it can probably be
// |     implemented using CreateToolhelp32Snapshot.
// | If it did exist, it would return a single thread from the process 
// |     not matching the current thread id.
extern HANDLE WINAPI SorceryToDiscoverASingleOtherThreadThatExists();

int main(int argc, char **argv)
{
    int returnValue;
    
    // | main2 will do the actual main's work.
    returnValue = main2(argc, argv);
    
    // | Do not finish before other threads finish.    
    for (;;) {
        HANDLE hThread;
        
        // | Find a single thread handle whose thread id is 
        // |     not the same as the current thread's.
        hThread = SorceryToDiscoverASingleOtherThreadThatExists();        
        
        // | If there are no more threads, 
        // |     we can finally break out of this infinite loop.
        if (hThread == 0) {
            break;
        }
        
        WaitForSingleObject(hThread, INFINITE);
    }

    return 0;
}

And main2.c which would behave as our java program would:

/**
 * @file main2.c
 */
#include <Windows.h>
#include <stdio.h>

DWORD CALLBACK ThreadProc0001(LPVOID unused) {
    puts("Hello, World!");
    CreateThread(0, 0, ThreadProc0001, 0, 0, 0);
    
    return 0;
}

int main2(int argc, char **argv)
{    
    CreateThread(0, 0, ThreadProc0001, 0, 0, 0);

    return 0;
}

With proof of concept to make sure the above code works(deep_thread_nesting.c):

/**
 * @file deep_thread_nesting.c
 */
#include <Windows.h>
#include <stdio.h>

DWORD CALLBACK ThreadProc0001(LPVOID unused) {
    puts("Hello, World!");
    CreateThread(0, 0, ThreadProc0001, 0, 0, 0);
    
    return 0;
}

int main(int argc, char **argv)
{    
    CreateThread(0, 0, ThreadProc0001, 0, 0, 0);
    
    // | Do not exit until user presses ctrl c.
    for (;;) {
        // | Reduce strain on the CPU from the infinite loop.
        Sleep(1000);
    }

    return 0;
}

My problem is that I feel forced to use one of three incredibly ugly solutions:

The first involving the mystical CreateToolhelp32Snapshot function as this tutorial describes, in order to fetch one(or potentially be optimized further to return more than one thread that does not match our active thread id) thread handle(s) that we can use to wait on.

The second involving keeping a global registry of all the handles and having each thread lock the world, add the handle to the registry, remove its own handle, and unlock the world, possibly writing my own CreateThread wrapper that takes care of this for me.

The third being a rough idea, as I have no idea if this even works the way I think it does, hooking the CreateThread function to make all threads implement the second solution.

Question

Is there a way to make C or Windows API wait for all my threads to finish before terminating the program without effectively writing my own runtime?

CodePudding user response：

Not really an answer, but, as mentioned by IInspectable, ExitProcess is called by the CRT. So getting rid of the CRT the behaviour that you are looking for is restored.

Compile with /NODEFAULTLIB, include libraries using the command line and not #pragma comment(lib, ...).

#include <Windows.h>

DWORD WINAPI other_thread(LPVOID lpThreadParameter)
{
    HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);

    if ((hOut == INVALID_HANDLE_VALUE) ||
        (!hOut))
    {
        if (IsDebuggerPresent()) __debugbreak();
        return -1;
    }

    constexpr char string[] = "I am writing!\r\n";
    for (;;)
    {
        WriteFile(hOut, string, sizeof(string), 0, 0);
    }

    return 0;
}

int mainCRTStartup()
{
    HANDLE hThread = CreateThread(0, 0, other_thread, 0, 0, 0);

    return 1;
}

The other_thread continues writing, even after the mainCRTStartup exits.

CodePudding user response：

An answer that is closer to what the OP intended:

#include <windows.h>
#pragma comment(lib, "synchronization.lib")

// the program will not (usually) exit, until this counter is at 0
long long deployed_threads_counter;

// we need a place to store the user's function pointer,
// as the lpStartAddress parameter of CreateThread is already used
struct ThreadParameters
{
    LPTHREAD_START_ROUTINE  lpStartAddress;
    LPVOID                  lpParameter;
};

// a wrapper around the user provided LPTHREAD_START_ROUTINE
DWORD WINAPI my_thread_start(LPVOID lpThreadParameter)
{
    // dereferenced! my_create_thread can now exit
    ThreadParameters thread_parameters = *(ThreadParameters*)lpThreadParameter;
    WakeByAddressSingle(lpThreadParameter);
    // actually do the work
    BOOL result = thread_parameters.lpStartAddress(thread_parameters.lpParameter);
    // signal that the thread has finished executing
    InterlockedDecrement64(&deployed_threads_counter);
    WakeByAddressSingle(&deployed_threads_counter);
    return result;
}

// CreateThread substitude incurs the desired behaviour
HANDLE my_create_thread(
    LPSECURITY_ATTRIBUTES   lpThreadAttributes,
    SIZE_T                  dwStackSize,
    LPTHREAD_START_ROUTINE  lpStartAddress,
    LPVOID                  lpParameter,
    DWORD                   dwCreationFlags,
    LPDWORD                 lpThreadId)
{
    InterlockedIncrement64(&deployed_threads_counter);
    ThreadParameters thread_parameters =
    {
        lpStartAddress,
        lpParameter,
    };
    // call my_thread_start instead, so that the thread exit is signaled
    HANDLE hThread = CreateThread(
        lpThreadAttributes,
        dwStackSize,
        my_thread_start,
        &thread_parameters,
        dwCreationFlags,
        lpThreadId);
    // do not destroy thread_parameters, until my_thread_start has finished using them
    WaitOnAddress(&thread_parameters, &lpStartAddress, sizeof(LPTHREAD_START_ROUTINE), INFINITE);
    return hThread;
}

// optionally set this behaviour to be the default
#define CreateThread my_create_thread

int use_this_main();
int main()
{
    // execute user code
    int result = use_this_main();
    // wait for all threads to finish
    while (auto temp = deployed_threads_counter)
    {
        WaitOnAddress(&deployed_threads_counter, &temp, sizeof(temp), INFINITE);
    }
    // fallthrough return
    return result;
}

int use_this_main()
{
    // your code here...

    return 0;
}

Currently there is actually a race condition, if InterlockedIncrement64 is called after the main's WaitOnAddress. This can be prevented, with something like a double gate system, but the answer is already complicated enough.